Live Action Mafia

A game of sneakiness and paranoia
It is currently Thu Mar 28, 2024 7:49 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: achester's thoughts
PostPosted: Thu Jan 25, 2018 11:08 am 
The other spymasters have agreed not to read this until after the game.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Thu Jan 25, 2018 11:11 am 
Day 1 thoughts:

Ack; today was a bad day. Win probabilities 1:1 red, 1:1 green, 1:2 blue.

Late clue posting:
-aok thought that day rollover was at 23:00, and said he hadn't even started thinking about his clue at 22:00. (He'd had plenty of time to (e.g. he spent the previous several hours playing Factorio with me), and it was posted in forums and in the rules.) The maximally rules-lawyery response would be for him to not give a clue, but that'd ruin the game, so we gave him until at most 23:00.
-I envy how great a clue he game up with in 15 minutes. I was putting a lot less effort into this game than the last one (I'm sort of burned out already, not a good sign), but still put in ~3 hours and made a 38x27 grid of judgments.
-Both other spymasters said they were originally looking for clues for the wrong team. Ouch! They've both played twice before, so they know the color cycle, and it's in the rules.
-I got to see Daniel Grazian's clue for green (fortunately, he posted it prematurely, so I accidentally saw it and could warn him he was cluing the wrong color). It was TREASURE for 9. I'd been thinking of money-related clues, planning to wait for tomorrow because they all hit DIAMOND on ksedlar strongly. TREASURE's a bit better than those because it gets CHEST better. I ended up not going with them (see the section "Clue number: 1, 4, or 12?"), but now think I should just have stuck with the big-clue-with-lots-of-bystanders approach. As it turns out, DIAMOND wasn't even a problem because ksedlar's BOND was clued so strongly that she's basically proven red; also, making her less trusted'd've been good, not bad.
-I thought cluing TOWER was going to be fine because it would give jamb a chance to maybe infiltrate green, which she'd been musing out loud about doing. Since game started and color info went out with the clues unposted (which explicitly wasn't supposed to happen), she claimed blue before seeing that she'd've had an opportunity to claim green, so aok's delay actually cost me something. I did agree to giving him some extra time, so I won't do anything about it, but it's really annoying that aok's being late cost my team and not his.

PALANTÍR vs PALANTIR, the clue:
-There wasn't consensus in spymaster chat over whether to allow the accent; aok thought it shouldn't be allowed; Daniel Grazian thought it should be. (Arguments for: that's the Wikipedia page title; it's a single word cluing things semantically. Arguments against: English removes accents when it imports words unless proven otherwise; the "Wikipedia page titles are allowed" rule only applies to proper nouns, which PALANTÍR isn't.) We brought in lukesci, who ruled in favor of PALANTÍR.
-My reason for wanting the accent was to get people to look at the LoTR thing instead of the company; there were way too many bystanders for the company, so I'dn't've even given the clue PALANTIR. It'd actually have been better for me if PALANTÍR had been forbidden, since it turned out to be a mistake.
-jamb: "so in case you are wondering I noticed the accent
I am assuming that means "look at the LOTR thing not the company"
but I don't particularly believe anyone in this room is blue, so I'm going to try to keep them on that track
Jackie • Now
"
Phew! Yes, I was wondering; yes, that's correctly interpreted, and you even guessed correctly that no one in the room is blue.
-jamb, later: *is confused why there are only about 8 words she thinks fit PALANTÍR at all, since she's used to much bigger-numbered clues that hit many more things*
*decides to include the company, and promptly adds about 20 more bystanders* [1] D:
-My actual thoughts on PALANTIR the company were just:
(1) It makes jamb look a bit bluer, since she's going to work there and so would be able to interpret the clue (and she pointed this out once in discussion),
(2) The clue is for a small enough number that my team should hopefully hit the LoTR things and then, when they're done, take the company things as anticlued (which is what a clue for a small number would usually do).

PALANTÍR vs PALANTIR, the opsec:
-jamb is trying to avoid spreading the realization that there's an accent in the clue. I think this is a mistake. The total cost of the mistake is really small, but I did a bunch of calculations on it for fun.
-An estimate of how valuable it is to blue to spread that realization: Let ε<1 be the probability that each player doesn't realize that there's an accent. (I'd guess ε is about 0.1; it seems like jamb's estimate is higher.) (Yes, it's probably different for different players, but close enough.) There is, and Jackie correctly believes there is, at least one blue person who hasn't contacted her, who therefore has an ε chance of wasting all their shots, for an expected value of +4ε if jamb posted on the forum "There's an accent there, which means LoTR; pay attention to it". For the other teams, what's at stake is slightly more information about colors from a spymaster who isn't theirs.
--As an upper bound on the usefulness of that realization to another team, suppose that another person, with their 4 expected shots, uses that realization to move all 4 of them from the wrong color to the right color. Their shooting at a word of the wrong color is worth at most .125 to us (a 1/2 chance that the shots are against green times a 1/4 chance that an unclued shot hits); their shooting at words of the right team that were weakly-enough clued is similarly worth -.125 to us, for a total swing of 1 shot's worth. There are 6 people on other teams, so that's a hard upper bound of 6ε in cost of posting it publicly.
--People on other teams are either singletons (who haven't contacted their team) or not. People on other teams who aren't singletons have at most an ε^2 probability of not noticing the accent, which I'll round off to 0. If each other team is doing about as well as contacting each other as blue is, that'd be two singletons on other teams, for a cost upper bound of 2ε to just posting it publicly. That's already less than the 4ε gain, so it's definitely correct to post it publicly.
--Singletons have the full ε probability of not noticing the accent, but they're also likely to be people so loosely involved in the game that they'dn't solve the other teams' clues just to get information about colors. I won't assign a value to this to maintain the upper-bound property, but that makes the cost even less.
-As an alternate observation of how valuable "slightly more information about colors from a spymaster who isn't theirs" is, although jamb has come up with answers to the other spymasters' clues as the first step in using that information, she hasn't even used that information for herself yet; it looks like my teams shot's for today are being determined entirely by my clue and not the other spymasters'. She'll get around to building a proper Bayesian update system to use that information eventually [edit: she built it as I was writing this :)], but if even she (who, along with ksedlar, is one of the most likely people in game to use that information) hasn't even used it yet for today's shots, that information can't be very valuable.

-jamb had a conversation with brunnerj in which brunnerj listed RING as a weak fit for the clue and jamb asked why it was on his list at all (pretending not to know about the accent, because she's fairly confident he's nonblue). I'll be curious to hear, after the game, whether he
(a) didn't see the accent,
(b) saw the accent, but believes Jackie didn't, or
(c) saw the accent, doesn't believe that Jackie did (and so probably learned that Jackie doesn't believe his claim).
((d), that brunnerj is blue, is a possibility I can rule out but Jackie only thinks is very unlikely).
I bet 1:2:4 (a):(b):(c). Jackie has only explicitly mentioned (a), (b), and (d); I don't know whether she thought of (c).
Anyway, the potential gains for hiding the accent from him were tiny (brunnerj, at least, is surely in contact with someone from his team, so even if he managed not to see it, they probably did); if he is blue the cost of hiding it is significant (as above); and, as the only cost I think was actually relevant (other than the time spent), Jackie probably proved to brunnerj that she didn't believe his claim.

Clue number: 1, 4, or 12?
-Let DAC[n] be a DAC game in which each team has n shots per day. LAC went from DAC[9] to DAC[3] or so; DAC1 was DAC[6]; DAC2 is roughly DAC[12]. In DAC[1], all that matters is cluing very precisely; a shot against a random word is wasted. In DAC[\infty], all that matters is cluing who the other team is; once you have that, you can shoot random words until they're dead. DAC[6] is pretty similar to DAC[1] in that my team's shots can't keep up with all the True Names I could clue; I could give precise enough clues that my team uses most of the shots they have available on True Names. I claim that DAC[12] is pretty similar to DAC[\infty] in that my clues can't keep up with my team's shots, so there's substantial value in at least making sure the extra shots hit the right color (0.25 True Names for a random shot on the right color vs -0.125 for a random shot on the wrong color).
-A clue for 1 has an expected value of 1+11*.25 = 3.75 True Names hit: the one clued word is definitely hit, and the other 11 shots are random.
-A clue for slightly more than 1 gains .75 per extra True Name it hits (e.g. a precise clue for 4 is worth 6), but risks hitting people on the wrong teams and causing lots of extra shots to go to waste.
-A clue for a number like 12 probably hits enough bystanders that any extra shots are wasted (on possibly even the wrong color). If it gets 4 correct, it beats a clue for 1; if it gets 6 correct and splits the others evenly (which would be an amazing clue), it beats a precise clue for 4.
-aok seems to have succeeded at the precise-clue-for-4 thing: His team should be able to hit 4 things correctly and 8 names that are at least of the right color.
-I got too greedy. I figured that, as long as at least one True Name (RING) were clued strongly enough, any extra shots my team made would be at least as strong as correct-color bystanders, so I could afford to include a few other True Names. But, as in note [1] above, my team (well, Jackie, the third of my team from whom I've heard something so definitive) doesn't understand why the clue is so small and is just treating it as a clue for infinity, reaching into things like the company which I'd hoped would be anticlued by that the clue number was so small.
-Daniel Grazian went with the large-number strategy. It really hits a lot of bystanders and a lot of his things are stretches, but it hits enough things correctly to be a pretty good clue anyway. I think he and aok clued pretty similarly well, and I'm about half a clue-day behind them. :(


AdamYedidia vs jamb:
-In the last game, jamb was strongly clued as non-blue, but my team decided to trust her for the first day anyway, pending a parity check on her. I think that was risky (she could've wasted the parity check and all their shots), but for some combination of their correctly psychreading jamb and that it ended well, I didn't comment on it.
-In this game, jamb is strongly clued as non-blue. jamb tried to muscle her way into dictating the parity check this game too, but AdamYedidia is the bluest person and knows it and won't go along with psychreads, so that didn't go well: AdamYedidia said " if I may channel the Spirit of Pesto [correctly channeled --Pesto]
if indeed you are blue, you should just always do whatever I, the most trustworthy blue person, say"
Jamb did indeed go along with it, and then they had a mostly reasonable exchange, except:
-AdamYedidia and jamb had basically agreed to use {0, 0 mod 3, 1 mod 3, 2 mod 3} to clue jamb's and corwind's blueness. jamb proposed setting 0 to "Jackie is not blue and corwind is blue", wanting to avoid forcing me into cluing for 0. AdamYedidia didn't like that because he thought that outcome was too likely, and so proposed setting 0 to "Jackie and corwind are both blue", which he thought was least likely. They argued over that for a while; neither of them suggested making 0 "Neither Jackie nor corwind is blue", which seems like an obvious choice (Jackie's just as happy with it because she knows it won't be used; AdamYedidia should be about as happy with it because if neither of them is blue, he'd have trouble counting to three blues).
-Anyway, tomorrow Jackie'll be proven blue, and AdamYedidia will (I'm almost sure) immediately pledge fealty to her as the most trustworthy blue person, so I think the only real harm this causes is that they don't get the "corwind isn't blue" bit (because jamb pointed out (correctly) that if she's proven good by {1 mod 3, 0}, she gets to dictate, and so decreed that both of those just mean "jamb is good" to remove the risk I'd have to clue for 0.

The draft:
-jamb has observed that the partition thing is equivalent (for this number of players) to dividing into groups of 3 and cycling the spymasters' picks within them (that is, that there are only 6 possibilities). She's concerned by that that basically leaves her only 15 possible teams, since she's pretty sure she won't be on a team with ksedlar or brunnerj. That is a pretty low number, but only one bit of information beyond the 28 possible teams she could have a priori.

Color info:

Jackie built a fancy Bayesian color probability updater spreadsheet. It all looks good. It gets the top 3 blues, the top 3 reds, and the top 2 greens correct; the only thing it doesn't have is Python for green, and I'ven't much clued him. Great! Sadly, my team isn't relying on it today for their shots, so some of them are on nongreens, but it should be better tomorrow.


Goals for today:
-Clue for 1 mod 3, to confirm AdamYedidia and jamb to each other.
-Decide how strongly to clue Python and Lucy. Python's the only green I can really try to make seem not green, because jasonye and corwind are more strongly clued and my best clues today are stronger for them. He's also the one I least want green to trust to organize them. Lucy's infiltrating green; my instinct is that that's a bad idea, but I shouldn't get in the way of my players too much. On the other hand, my team did waste a bunch of shots either on or by Lucy today, and I'd like to get them to stop that. Also, green's not at all the team I'm most scared of, so I don't care as much about infiltrating them. I think I'll ignore Lucy's words (too much to think about) except maybe her True Names, and treat all green True Names as equal (which likely means that Python won't be strongly clued today).
-Treat basically any connection as strong enough to include in a clue, since my team'll have gajillions of shots available, and this'll be their first real queue-forming day.


Clues:
The top three for today are:
3. CUIRASSIER for 10 (SOLDIER because they're soldiers, CHEST because the armor they're named after is chest armor, SUIT because they're named after their suits of armor, LIMOUSINE because it's a French word for mostly-French soldiers and Limousine's a state in France (and Jackie's heard that this is my first association for "Limousine"), DRAFT because soldiers are drafted, POLICE because they're people who use force sometimes, NURSE because they need medical attention after a battle, KETCHUP because they're cavalry who catch up to their enemies, GRASS because they charge across grass, and GOLD because the cuirasses might be made of it in a game.)
2. TSO'S for 16 (SOLDIER because generals command them, CHEST because it gives you a heart attack or is made of chicken chests, PIE because it's a food, KETCHUP because it's a food, NUT because it's a food, FISH because it's a food, DRAFT because generals draft people, POUND because generals pound their enemies or you gain pounds by eating it, NURSE because people need medical attention after generals have commanded them in battle, SUIT because the soldiers generals command wear suits of armor, PLATYPUS because it's a fleshy creature from the same hemisphere, KID because that chicken's popular with kids, MATCH because it's fried food, TORCH because it's fried food, POLICE as another user of force, and STRING because string cheese is a food). As a bonus, hits CHINA for extra greenness on corwind.
1. MIDWICKET for 10 (CENTER because mid, BAT because the players use them, GRASS on which cricket's played, MATCH as an instance of cricket, PLATYPUS via Australia where cricket's played, DRAFT because players are drafted for cricket, KETCHUP because they catch balls, POUND as the currency where midwickets were invented, SUIT because the players wear protection, and CHEST as a person's midsection). As a bonus, gets CRICKET, proving jasonye green.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Thu Jan 25, 2018 10:28 pm 
Day 2 thoughts

Summary:
Recovered somewhat today, despite a lot of missed connections on MIDWICKET. HP at the start of today was 24/23/22 R/G/B; my guess for the end of the day is 21/19/18 R/G/B, and red-cluing-blue has a bit of a disadvantage saved up (which I'd estimate to be worth about -0.5HP) in that Daniel Grazian owes three clues tomorrow. Win probabilities 1:1 red, 1:1 blue, 1:2 green.

Color information:
-Jackie's been worried about whether she's using her color information correctly, or whether I think she should know better who's on what team. Yep and nope, respectively: her color guesses are all correctgood (all her top guesses are correct except for not being sure which of corwind and Lucy is blue and which is green) and about as good as they should be given her info. (She has missed a few connections I think are obvious for some of my clues, but spymasters always think that. Currently it's CENTER for MIDWICKET.)
-Edit: saw CENTER, and now all her color guesses are correct.
-Today's parity check locks this in: it'll confirm that all the colors are correct.

PALANTÍR:
-The three strongest ones (MARBLE, ROUND, RING) were shot twice. Oops. (Miscoordination due to AdamYedidia's being in another time zone.) ROCK and NOVEL I thought were both strong; I included NOVEL but not ROCK. In retrospect, ROCK was clearly better. Oh well.

MIDWICKET:
-My team's missing DRAFT, KETCHUP, SUIT, PLATYPUS, MATCH, CHEST, and POUND; didn't even mark any of them as possibilities. A bit worse than I'd hoped, but reasonable; I'm going to write off all of them except MATCH as basically unclued.
-Edit: Lucy saw MATCH, and my team's shooting it. Yay!

Goals for today:
-0 mod 3 tells my team that they have all the colors exactly correct.
-Big number tells my team to mostly ignore their queue. (The correct things in it will be at most GRASS, and they might shoot even that.) (Edit: Nope, queue is entirely wrong.)

Clue for today:
-TSO's from yesterday fits everything I want perfectly. I just have to decide which of the weakest things to throw out of it to reduce it to 15: NURSE, PLATYPUS, KID, POLICE, or STRING. Gosh, are those all weak, but random TSO'S bystanders are better than the known MIDWICKET bystanders. I think PLATYPUS is weakest.
-Edit: actually, I don't even get to throw that out, since my team saw and shot MATCH.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Fri Jan 26, 2018 10:09 pm 
Day 3 thoughts:

Summary:
HP at the start of the day was 20/19/19 R/G/B (I'd predicted 21/19/18). I predict 16/15/14 R/G/B at the end of today, but with blue hopefully having a better queue built up. Win probabilities 1:1 red, 1:1 blue, 1:2 green.

TSO'S

jamb is wondering how many of the connections of TSO'S (general->military, proper names, chicken->animal, food, China, "so", "sews", "sows") I thought of/used. I considered the first five; no proper names or China things were actually clued by it (e.g. Beijing is a bystander). The homophones I thought were weak because I pronounce the T in TSO'S; oh well.

The two big things I'm sad my team didn't think of about TSO'S are:
-the word KETCHUP. I think of general Tso's chicken as a fast-food-y food court item, so I thought it'd be a strong clue for KETCHUP. The next-easiest way to clue KETCHUP is as a homophone of CATCH UP, but my team didn't think of that. Also, KETCHUP is a hard word to clue, so clues that could touch it should have more weight. One system (which I heard of from last game's green team) that'd be good for this is to list, somewhere, a few meanings of each word (e.g. "fast food", "catch", "red") and look through those for each clue. One benefit of this is to reduce missed connections by checking each clue against all the listed meanings; it might not be the optimal way to spend time avoiding missed connections, and my team's usually been pretty good at getting connections, but it'd be something. The other benefit is to see which words are hard to clue for having few meanings or hard to avoid cluing.
-The fact that a big number likely means their old queue was bad. Their PALANTÍR queue was entirely incorrect, but they're shooting at two things from it; their MIDWICKET queue was really bad, but they're shooting at two things from it and left the top of their queue from it. Due to overlap, that still means they're shooting at 10 TSO'S things, so things are pretty good for today on that front. But the queue is again awful (the correct things on it are the 7th, 13th, 20th, and 21st things), so I'd like to be able to give them a big clue again today to have them throw it out, but it's getting harder to give really big clues and I'm not sure it'd even get them to throw out their queue.


PARITY CHECK

Today's parity check is "clue 0/1/2 mod 3 if the first correct word among those is in {AIR, FAIR} {JUPITER, KID, LOCH NESS} {NOTE STRING TORCH WORM}". (I had to check that this was even legal; we seem to basically be allowing anything that's at most one trit per day.) I think this is a reasonable system, but the number should be 1/2/6.

On a walk, Jackie was doing calculations out loud about what the numbers should be. She asked me to record here whether I was glad that she did the calculations instead of relying entirely on her gut: yes, definitely.

Her calculation makes two approximations: that two of those nine words are correct, and that the goal is roughly to split them into three equal-probability sets. Assuming both of those, she gets that the last set should be of size 5 (so, probability 20/72); size 6 (probability 30/72) is slightly farther, and the first two sets should be of size 2.

Those 9 words are python2's; my team is (correctly) pretty sure that either he or I forget which other spy still has 3 HP. Jackie estimated a half chance of each. Doing the calculation with that:
-If the first set is of size 1 and python2 has 2 words left, it has probability 2/9; if the first set is of size 1 and python2 has 3 words left, it has probability 3/9, for a total probability of 5/18 if the first set has size 1.
-If the first set is of size 2 and python2 has 2 words left, it has probability 5/12; if the first set is of size 2 and python2 has 3 words left, it has probability 16/21, for a total probability of 99/168. That's much farther from 1/3, so the first set should be of size 1.
-If the last set is of size 5 and python2 has 2 words left, it has probability 5/18. If the last set is of size 5 and python2 has 3 words left, it has probability 5/42, for a total probability of 25/126.
-If the last set is of size 6 and python2 has 2 words left, it has probability 5/12. If the last set is of size 6 and python2 hsa 3 words left, it has probability 5/21. That's a total probability of 55/162, which is almost exactly a third. So the last set should have size 6.

The other important correction is that not all information is equal: for instance, "clue the ternarity of the SHA-1 hash of the concatenation of all the true names" is very close to perfect information-theoretically, but useless. I think the most relevant piece of this correction comes from that shots are expensive, so finding out something isn't a True Name is worth 0.25 True Names hit (because the random shot can go somewhere else), but, say, finding out that a set of 4 Names contains exactly one True Name is useless, because those shots are as good as random. In this case, my team's going to find out that a set of 3 True Names contains at least one (in fact, exactly one) True Name, which is a savings of only 0.25. I think this biases you toward wanting the first two sets to be smaller (so they're worth shooting at if you get them), although I doubt it reduces the optimum all the way to 1/1/7.

While I was writing this, Jackie made the update to 1/3/5. That's better in expected value, so certainly the correct decision (well, a more correct decision; 1/2/6 would still have been better), although it turns out that we gain a bit less from it because they'll hit exactly one name in the middle set of 3 either way, and the old way would've eliminated one more bystander.

The new check is 0 if {FAIR}, 1 if {AIR, JUPITER, or KID} but not 0, and 2 if {the remaining 5} (mod 3)

I could maybe give 2 mod 3 and clue KID strongly, but they might think I was just taking advantage of KID's being proven non-True to give a clue that I'd otherwise wanted to give for them, or they might think I didn't like the whole system (and it is good, just not quite optimally used, although if they switched to a different system it might be a queue-permutation one I think I'd like better), so I'll just follow it, answer 1 mod 3, and be slightly sad.

corwind substitute

I'm a bit annoyed about the corwind substitution thing today. On day 1, my team lost 3 shots to jamb and Lucy submitting the same words because they both told AdamYedidia they were shooting them, but he wasn't around (and knew he wouldn't be around) for the last 10 hours of day to unconflict. Today, aok's team was going to lose 3 shots to corwind not being around for the last ~10 hours of the day (in his case, the information from teammates making those shots useless was not that those names had already been fired on but that the person he was firing on was outed), but aok's getting corwind replaced by cmcclena. It is good for peoples' personal lives to not hurt the game, but the disparity in treatment of very similar situations is sad. (Also, what if corwind had been a traitor just using the excuse of being offline to submit useless shots? This technically leaks information.) I think aok's team is going to lose anyway (they're certainly behind), at least.

Goals for today:
-1 mod 3, because I have to, even though it's sadly little info.
-Clue for a big number, hoping to get rid of the queue.
-hit PIE to out corwind1
-hit SCIENTIST to out corwind3
-hit NOVEL to finish clue 1
-If I hit NOVEL, also hit NUT to kill jasonye1
-Hit GOLD to out python1.

Clue:
-EUCLID'S for 4 (PIE because geometry uses pi, SCIENTIST because Euclid's one, NOVEL because Euclid's elements is a book, GOLD because it's an element). Only for 4 instead of 7, even though I could include DRAFT (of a book), KID (who uses it), and STRING (because of the apostrophe) to reduce the number of bystanders they'll hit from it; they'll probably hit WATER and PYRAMID as it is, and big numbers haven't been successful at clearing their queue anyway.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Sat Jan 27, 2018 10:24 pm 
Day 4 thoughts:

Summary, start of day:
HP at the start of the day was 16/16/14 R/G/B (I'd predicted 16/15/14). We did worse on TSO'S than I'd hoped, and my team's queue is pretty bad (both mostly for too few food words), so we're definitely falling behind. Red is pulling ahead. Green is still behind; they made it through today better than expected because corwind got a substitute, but are still in bad shape.

I predict 12/11/9 R/G/B at the end of today. Win probabilities 4:1 red, 1:2 blue, 1:4 green.

Summary, midday:
Today was a catastrophe; it's roughly as bad as if I just hadn't given a clue today. I know spymasters are always sad about their teams missing connections they think should be obvious, but I'm putting some time today into figuring out how that happens, because I think my only real chance at this point is to understand that better enough than the other spymasters that they mess up repeatedly and I don't.

Win probabilities 8:1 red, 1:4 green, 1:8 blue.

EUCLID'S:
This was supposed to clue PIE (outing corwind1), NOVEL (finishing PALANTÍR), SCIENTIST (outing corwind3), and GOLD (outing python1). Each of those would've removed 2-4 bystanders from my team's queue. My team's only shooting at NOVEL, which I thought was the weakest of the four, and also the easiest to clue elsehow (via a "NEO-" prefix). That's a worse-than-random hit rate, and instead of outing people, they're spending a lot of shots on people I thought would have been outed, making even today's shots on bystanders worse than random.

Explanations for those:
-PIE is PI, the _Greek_ (like Euclid) letter most used in _geometry_ (which Euclid invented). My team even quoted "Book 12 studies the volumes of cones, pyramids, and cylinders in detail by using the method of exhaustion, a precursor to integration, and shows, for example, that the volume of a cone is a third of the volume of the corresponding cylinder. It concludes by showing that the volume of a sphere is proportional to the cube of its radius (in modern language) by approximating its volume by a union of many pyramids.", and the constant of proportionality is 4/3*PI. Never seen.

-NOVEL because Euclid's Elements is another kind of book. My team got this one.

-SCIENTIST because Euclid is a scientist. Giving the name of a specific scientist seems right up there with some -OLOGIST no one's ever heard of as a way to get SCIENTIST. Jamb saw this but eventually decided it was too weak; what?!?

-GOLD because EUCLID'S autocompletes to ELEMENTS, and GOLD is an element (in fact, the only element on the green board). Never seen.

-Words my team's shooting instead: PYRAMID (ok), SATELLITE (because there's a planned European Union space mission named Euclid---was this really stronger than, say, SCIENTIST?), PLANE (ok), and NOVEL.


How clue-solving works

Disclaimer: This is sort of ranty.

There's implicitly a three-dimensional matrix of possible connections: one dimension is the interpretations of the clue (e.g. geometry, elements, books, Euclid the person, proposed European space missions); another dimension is the Names; the last dimension is the possible interpretations of the Names (e.g. food, round, geometry, American). A day like today has 4-8 interpretations, ~60 names still under consideration, and ~4 interpretations per name. Any cell in that grid could be holding a connection for a clue, for a total of 1000-2000 places to look.

-Observation one: clue-solving seems to happen in O(number of names+number of clue meanings) or maybe O(number of names*number of clue meanings), not \Omega(number of names*number of clue meanings*number of interpretations per name): that is, people go through the list of names once after thinking of all the clue interpretations, maybe two or three times if there are a lot of clue interpretations, and probably again if they think of a new interpretation like proposed European space missions. In particular:
--A lot more time will be spent on marginal clue meanings than I'd like. When I come up with a clue, I can merge two conceptual categories (e.g. books, geometry) into one clue, maybe three if I'm really ambitious (e.g. generals, chickens as food, chickens as animals), and maybe choose an instance of that category that has a random connection to one other thing (e.g. elements). If there are things that don't really fit categories (e.g. Euclid is the name of a planned European space mission), I can't systematically include more than one of them; if there are ten categories (e.g. food, animals, military, proper names, China, sewing, sowing, so), I can't actually use more than three or so of them, but I have to assume that my team will consider them all. (In fact, empirically, my team puts more time into the weaker conceptual categories, because they think of those later and so go through the whole list with just that one category in mind, whereas the strongest conceptual categories share mindspace for the first run through the list).
---Solution 0: Although my clues can have multiple meanings, Names can't; I should avoid cluing any but first, most obvious meaning of any True Name: homophones are right out (PIE for PI, KETCHUP for CATCH UP), SCIENTIST has to be an OLOGIST.
---Solution 1: instead of of merging conceptual categories and looking for a clue that fits the merger, take random words, find all possible connections to them, see how many of those hit True Names, and repeat until I get a good one. This takes too much of my time.
---Solution 2: give clues for a small enough number that, even if my team generates a gajillion interpretations of it, there'll still be few enough things that fit the strong interpretations that they shoot just those. I thought 4 would be small enough to do this with EUCLID'S, but I guess not. (Seriously, the name of a planned European space mission made it on the list, but not anything for ELEMENTS and not SCIENTIST?)
---Solution 3: Out people to reduce the space of names my team has to look at, and hope they use the extra time to look more carefully at the possible meanings of each one. I was trying to do this today: EUCLID'S was supposed to out three people. Unfortunately, other than NUT, the only three names I can clue to out someone are GOLD, PIE, and SCIENTIST (the three things I thought were strongest for EUCLID'S). If I try to out people with more than one HP, my team has to shoot two words correctly on the same person, which is hard enough for one person, much less two. If I clue GOLD, PIE, and SCIENTIST again, I'm writing off today as actually entirely wasted. This is probably the best thing to do in expected value of Names shot tomorrow, but is a low-variance strategy when I'm behind; maximizing my win chance probably relies on cluing for high throughput on other things today and hoping my team realizes those mistakes before shooting too many more bystanders on the same people.
-Observation two: some cells of that implicit matrix just get missed. Two possibilities for why:
--I know that even when I'm solving regular Codenames clues, my eyes sort of glaze over by the end of a list of just 25 words for a clue I only expect to have one or two interpretations, and I might miss even the clearest connections. I think this was what happend with CENTER: jamb actually went through every possible Name saying "MID[Name]", but CENTER was near the end since she wasn't sure corwind was even green. (CENTER they did see after sleeping.) Similarly, after Jackie said "I don't get why the apostrophe S", Lucy said "Euclid's elements is the first thing google showed me" and then stopped responding to the conversation, which I hoped meant she was looking for elements, but no luck. (I don't know whether she was actually looking at elements then, but definitely all three people on my team proposed ELEMENTS and no one saw it.)
--When people are playing Codenames without really wanting to, they often just don't look at the whole grid of words before guessing, just look at random parts of it until they think they have enough. This is sort of the same problem, one level up: there's an implicit grid (mentioned above), and my team has identified all the dimensions of it but isn't looking through it all.
--It's hard to find a solution to "people might not see a really great combination like MID-CENTER or ELEMENTS-GOLD and not see it", but I'll try:
--Solution 1: clue things that form compound words as much as possible; call even synonyms, hyponyms (e.g. EUCLID for SCIENTIST), and such nontrivially weaker. (No amount of mental fatigue will make you miss MIDPOINT if you say it. Actually, I just noticed that my team didn't see NUT for WING, so maybe the chance of missed strong connections is just high everywhere. Actually, maybe Jackie just doesn't know that WINGNUTs are a thing since she doesn't follow politics at all and the other meaning of WINGNUT is rare, and that was before the rest of the team was solving clues on the same spreadsheet.)
--Solution 2: This isn't a solution that I can implement this game, but I'm recording it so I know what to do if I'm a spy again: actually make a spreadsheet with the whole Name interpretation-by-clue interpretation matrix, mark every cell in it as unexamined, and then go through them all.
-Observation 3: Last game Jackie was really successful at interpretating my clues with basically her current approach. What changed?
--Hypothesis 1: my burnout as a spymaster, via not having an actual matrix of fits with bystanders and instead just looking through once for my final clue. This explains hitting some extra bystanders (which I'd hoped would matter less because there are more shots available) but not extra missed connections.
--Hypothesis 2: Jackie's burnout: she's spending a lot less time on game this time, particularly on the actually-solving-clues part; it's still a lot of time and she's not actually burnt out, but she definitely has a lot less energy than in the previous game. (Evidence for this: Jasonye walked in to talk about DAC while I was writing this, and Jackie didn't even realize he was talking about DAC; when he clarified, she said "Oh! I'd forgotten there was even a game going", certainly not something that'd've ever happened last game.)
--Hypothesis 3: this game is bigger (27 people instead of 18, proportionally bigger clues), and Jackie's and my way of doing things doesn't scale well enough.
--Hypothesis 4: random variation. Some of this is probably just that we were particularly lucky last game, or particularly unlucky this game.
--Hypothesis 5: the rest of the team. Last time there were 5 extra people (extra in the sense that they're not Jackie, the one person in common between the teams) on the team, several of whom were very active; maybe missed connections just got resolved before I even noticed them. This time there are only 2 extra people, and even though each of them (particularly Lucy) has caught things Jackie missed, there are fewer of them, and I think both of them were more looking for new interpretations of clues (like "ELEMENTS" resuggested by Lucy or proposed names for planned European space missions suggested by Adam) than rechecking the basic interpretations, and it's there that most of the missed connections are.)
---Solution 1: without really understanding why, I should try to make this game more like the previous one. I can't fix the number of people on the team. I can try to reduce the search space to closer to last game's (see above). Twice now I've tried twice now to clue a multi-word idea by giving a word that can only be completed to it, and in both cases my team's seen the connection but the clue has flopped anyway; I should just stop doing that.

Anyway, that was all a long elaboration on "I'm sad and my team is missing connections that should be obvious", which is how spymasters always feel, but I needed to rant.

Goals for today:
-Don't put any effort into recluing GOLD, SCIENTIST, or PIE. If we're going to have a chance of winning, it'll rely on my team seeing those without my needing to reclue them. They will at least learn that they only shot two words correctly today, and one of those was in {KID, JUPITER, AIR} via parity check, and hence probably not EUCLID, and another one was NOVEL, which they'll be told finished PALANTÍR and so might think (incorrectly) wasn't part of EUCLID, so they'll know they have at least 2 things left for EUCLID, and maybe up to 4.
-Hit NUT (to out jasonye1).
-Out someone else, by DRAFT+KETCHUP, NURSE+LIMOUSINE, SUIT+PLATYPUS, TORCH+STRING, or CHEST+POUND.
-Satisfy a parity check if my team comes up with one, but they seem to have forgotten about that entirely. I'll wait a while longer to come up with my clue hoping they give me a parity check, since whom I have a good chance of killing depends on what info they get via parity check.
-Fulfill as many as possible of the proposed solutions above, or ignore them all, assume my team will realize their mistakes and solve perfectly tomorrow, and just maximize throughput. Probably the latter; I should assume my team's playing optimally both because that's more fun and because I have to.
-Clue for a small number, because my team'll need its shots on yesterday's clue, and I don't want them to put those shots off to tomorrow, because our having any chance relies on my team realizing those and shooting several people dead today.

Today's clue: CASE, 2 or 3
Following one of my solutions above, it gets two things by noun compounds (NUTCASE and SUITCASE), both of which I hope are so strong that they'll just work. The third thing is CHEST; I'm waffling between leaving it out (because 2 would definitely be a shockingly low enough number that my team might pay attention to it (15 and 4 seem to have been roughly equivalent) and not scatter shots everywhere, but I'm not sure 3 is, and it's less likely they'd hit WATCH, KEY, THIEF, and NOTE this way (they'll probably hit POINT either way)) or including it (to hit CHEST, avoid saying by a small number that their queue is any good (but saying a big number didn't make them think their queue was bad), and so that, if they finish the clue today without hitting CHEST, they don't think CHEST is anticlued). If I had a parity check, that'd settle whether to include CHEST.

Phew, at 4:39 Lucy brought up "oh what should we do with our parity check?". Jackie's response is "I can't really think of anything better than just letting Pesto clue as he thinks best"---maybe reasonable, although it'd've been nice to know that before instead of just thinking they'd forgotten about it. ...and that discussion just sort of fizzled without a clear resolution, but I think they're just not doing a parity check.

I think I'll leave it out; even on a clue for 2 they'll likely spend enough shots to hit CHEST.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Sun Jan 28, 2018 10:09 pm 
Day 5 thoughts:

Summary, start of day:
HP at the start of the day was 12/14/8 R/G/B (I'd predicted 12/11/9). (See yesterday's thoughts for the miserable difference.)

I predict 8/10/4 R/G/B at the end of today, but with a lot of uncertainty on G: it depends on what interpretation, if any, my team has for that I clued for 2.
Win probabilities 128:1 red, 1:4 green, 1:64 blue.

CASE:

After some initial indignation, my team found NUT and SUIT and is convinced they're the strongest; they might even shoot just those two? It looks like they probably also thought CHEST was the next-strongest, so at first glance it looks like I should've clued for 3. (Haven't seen how many they'll decide to shoot at, though; if they hit CHEST today too because they're uncertain that'd be perfect.) I was trying yesterday to not let the missed clues make me mistrust my team so much that I played suboptimally, but cluing for 2 instead of 3 was in that direction and it turns out that 3 would have turned out better.

Edit: actually, at least TABLET and SCHOOL would apparently have beaten CHEST, so I'm not actually sad about having clued for 2.

Interpretation of 2 for CASE:

One reasonable interpretation of that the clue is for 2 is that the team should otherwise just fire at the highest word in their queue for each person. This is an entirely reasonable interpretation; it's the correct thing to do in general. (I'd like if they symmetrically updated _down_ their liking of their queue when I clue for a big number, by conservation of expected evidence, but alas.) My only hope is that they'll decide not to. Reasons they might not:
-The top reason is that they know they hit at most one thing for EUCLID'S yesterday, but the top of their queue is mostly not for EUCLID'S (they do have three things for EUCLID'S on the queue, but they'd all have been better clued by EUCLID or better still by ACADEMUS or some other Greek philosopher). The other things on their queue are mostly the 15th-ish-best things for TSO'S, for which their priors should be that they're close to random shots, although I guess they should update toward them being correct from that I clued for small.

The closest they got to this thought is:
jamb: "I feel like way too many of these rely on the homophone hypothesis" (because most of them are clued by SO, SEW, or SOW, which are slant homophones of TSO. Yesterday people were joking about trying to extract information from me, and I couldn't even say TSO'S lest my pronunciation of it give a clue that the homophones are wrong). Also, alas, you're looking for homophones in the wrong place (the already-overconstrained clue instead of homophones of the Names).

This is wishful thinking at best, and made even more difficult by that AdamYedidia isn't online until long enough into the day (14 hours) that my team will have basically decided on its approach, but I think it's the only chance we have. They realize (at 14 hours) they need to turn it around. Will they realize that means they need to shoot several more things for EUCLID'S? No. Well, it was a long shot anyway.

Lucy: "but ive been very surprised at some the things that weren't hits"
I think those were mostly from the first day or two's clues, when I was still trying to give color information by hitting the occasional really strong bystander (like CRICKET).

Paths to victory:

Well, the possible path to victory is really, really slim at the moment. It certainly relies on Daniel Grazian messing up; I'll take that as a given. Also on green not spending their shots on up; I suspect they did that not because their team's doing badly at determining colors and didn't even know who's red but because they mistakenly thought a trade with red would be a good idea, for some combination of not realizing that red is winning by a lot and just wanting to feel like they're doing something spyish. Also depends on the RNG giving a lot of shots today, to make it even mechanically possible to catch up. Also probably depends on my team deciding to look back at EUCLID'S and seeing several things for it, even though they didn't today. And then also probably depends on me cluing every remaining True Name today, this time without hitting too many bystanders because there's no longer really room in the shot budget.

Names left, in priority order:

jasonye2 has NURSE and LIMOUSINE, both essentially unclued.

jasonye3 has PLATYPUS, essentially unclued.

python2 has TORCH, essentially unclued.

python3 has CHEST and POUND, both essentially unclued.

corwind2 has DRAFT and KETCHUP. DRAFT is at the top of the queue for corwind2, so I could just clue KETCHUP, although they'll probably only shoot at one thing for corwind2, so I could leave KETCHUP for tomorrow. Also, the path to victory already probably relies on them seeing PIE-PI, and maybe when they do that they'll make a pass through for homophones?

corwind1 has PIE. I'll clue this if I can, but the clue's almost certainly overconstrained, and this is the first thing to leave out.

python1 has GOLD. I'll clue this if I can, but the clue's almost certainly overconstrained, and this is the first thing to leave out.

corwind3 has SCIENTIST. Although they don't think it's worth firing on for today, it's at least now the top of their queue for corwind3.

So I need to get at least most of NURSE, LIMOUSINE, PLATYPUS, TORCH, CHEST, and POUND today, rely on all the things out of my control to go right and give me another day, and then get the remainder of those tomorrow and ideally have a little cluing flexibility left to fix queue problems.

A tempting approach is to list meanings for each of the Names and try to find a way to combine those (e.g. LIMOUSINE is a French thing and POUND is British, so maybe a French currency), but solving that'd require looking at many meanings of the Names and not worrying about too many meanings of the clue. My team seems to be erring in the opposite direction, looking for many meanings of my clues and not thinking about many meanings of the Names, so I should ideally find one clue with many meanings that correspond to the words I need to clue. It's much harder to construct a clue that way, so I might not be able to, but I'll see if I can.

The tops of the queues by player are surprisingly decent today (randomly): DRAFT (yes), QUEEN (no), SCIENTIST (yes), HELICOPTER (no), ROME (no), STADIUM (no), PIE (yes), LIMOUSINE and then NURSE (yes and yes), LOCH NESS (no). That makes this a little easier, in that if I can clue precisely enough, they'll shoot some correct words for the wrong reasons (e.g. PIE is on there because of CHICKEN, not EUCLID'S). That makes GOLD more important and LIMOUSINE and NURSE less; makes the most important words to clue KETCHUP, CHEST, POUND, PLATYPUS, GOLD, and TORCH. There are only 15 minutes left in day, not enough time to adjust fully to the new most-important list, but fortunately it overlaps a lot with the old one, and one of the top things on it, TORV, is someone born in GOLD Coast. Maybe as a bonus, that's in QUEENsland, which might push QUEEN down the queue?

As an added complication, if I clue for at least 6, my team says they'll basically ignore their queue; at most 5 and they'll spend at most about 7 shots on it. If I clue only 5 things, there's at least one thing left that they won't shoot; if I clue at least 6, then I need to also clue all 11 words, which is definitely impossible. So I think I clue 5 of them. Conveniently, there are several clues for the old important set that get five of the new six most important words, but not KETCHUP.

Clue for today: TORV for 5
(Remember in reading these explanations that I'm required to clue all these words in one clue today.)
PLATYPUS, because she's Australian.
GOLD, because she was born on GOLD coast and has blonde hair.
CHEST, POUND, and TORCH all because I found her by googling "sexy Australians" (oy, my browser search history), and those are connected via breasts, sex, and "hot", respectively. Also, she's of Scottish descent and the Scots use POUNDs.

...I didn't claim this was likely to work; really, we probably lost our last nontrivial chance of victory two days ago, when EUCLID'S killed 0 people instead of 3.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Mon Jan 29, 2018 2:24 am 
Summary, start of day:
HP at the start of the day was 6/11/2 R/G/B (I'd predicted 8/10/4).

I predict 0/6/0 R/G/B at the end of today.
Win probabilities 2^{13}:1 red, 2:1 green, 1:2^{12} blue.

TORV 5:

This worked surprisingly well; 6 of the 13 connections they see for it are True Names. Also, 2 of the top 3 things on their queue are correct.u

If they shoot those and spend a shot on the top of their queue, that'd be 7 hits, and green would be down to 4 HP (PIE, KETCHUP, TORCH, POUND) at the end of today, much better than I'd hoped for (although still a long way from actually finishing). Probably 5 HP because there are only 12 shots today, not 14.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Mon Jan 29, 2018 2:59 am 
History of clues and HP by day:
Day 1: I clued PALANTÍR 4; HP at the end of the day was 24/23/22 R/G/B. This was a suboptimal clue, which was my fault; I think its suboptimality was reflected in the HP totals. We were a little behind, but not much.
Day 2: I clued MIDWICKET 10; HP at the end of the day was 20/19/19 R/G/B. We caught up a little; HP totals still basically reflect who's winning.
Day 3: I clued TSO'S 15; HP at the end of the day was 16/16/14 R/G/B. Blue was convinced it was losing, which is true, but things were still pretty close.
Day 4: I clued EUCLID'S 4; HP at the end of the day was 12/14/8 R/G/B. This was a devastating for blue, particularly because EUCLID'S outed 0 of the 3 people I thought it'd out; I think this day took blue from slight underdogs to essentially lost.
Day 5: I clued CASE 2; HP at the end of the day was 6/11/2 R/G/B. Red has essentially won; today's clue of mine was a long-shot that, predictably, didn't work.
Day 6: I clued TORV 5; HP at the end of the day is probably something like 0/5/0 R/G/B.

Summary of game:

Daniel Grazian and the red team definitely deserve their (all-but-guaranteed) victory: solid clues interpreted correctly.

It's not clear whether green will squeeze out a tie (edit: he will, by one shot). Quite a comeback. I'm sad that the inconsistency between the treatment of corwind and AdamYedidia for very similar situations probably mattered, since the three shots green saved by getting a substitute for a non-absent player were more than green's margin of making it into the tie.

Blue suffered a lot from our equivalent problem, since the inconsistency between AdamYedidia's and jamb's schedules has made it hard every day for blue to get solving done: jamb wanted to do her solving in the first half of day but didn't have AdamYedidia's information then. (To be clear, these aren't the faults of corwind or AdamYedidia: real life comes first, and they were even both willing to spend plenty of time every day on game, with real life only affecting when they spent it; also AdamYedidia's and jamb's scheduling incompatibility is a symmetric problem between them.)

Color information same sadly much too quickly in this game. Jamb had a great system built to track color probabilities, but I think everyone basically had all the colors figured out on day 2. There just aren't enough possible sets of teams of 3 for it to be hard to figure out, and the drafting thing reduced it even a bit more. I therefore think we didn't really get a test of the new mechanics.

The game was a lot bigger in Codenames space than before: 1.5 times as many names, half as many people solving them on each team, and more shots shifting the equilibrium to bigger clues. It seemed to me as the blue spymaster like this went too far.


Top
  
 
 Post subject: Re: achester's thoughts
PostPosted: Tue Jan 30, 2018 11:16 am 
My spymaster sheet's at https://docs.google.com/spreadsheets/d/ ... sp=sharing .


Top
  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 5 hours


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group