i think this looks better
i think this looks better
I wonder if it would be possible to make semantic embeddings using synonym/antonyms from an thesaurus, though they'd probably suck.
At that point you might as well leave it as a graph rather than trying to reduce the number of dimensions.
I should release a page where users can load and test out different semantic embeddings just so my effort doesn't feel wasted.
I implemented Semantle but it feels bad to play because of what I mentioned earlier about average distance scores.
I also used a smaller dictionary so it runs out of good words for the top 1000.
Even though everyone online says GloVe and word2vec are equivalent but use slightly different methodologies, I believe that GloVe is significantly worse when it comes to rating the distances between two average words.
It might simply be the case that words are either close or far, but GloVe's scores for averagely connected words are much higher than word2vec's it seems, and I don't understand why.
The top 1000 words are good in both cases.
Actually, I haven't tested with word2vec yet, I'm just saying this from my experience comparing my scores to Semantle scores.
Investigating semantic distances has disenchanted Semantle for me because now the results feel like meaningless noise and I don't have faith in the logic behind the lower (non-top 1000) scores.
Early-game feels like guessing rather than trying to reason about what slightly-closer concepts might be.
The difficulties I was experiencing were related to file sizes. I actually do need the 300d embeddings and a large wordlist, the 50d embeddings suck.
i made a lot of progress but didn't write about it here. the inherent limitations are constraining me
The smallest 50-dimensional vectors in glove.6B.50d.txt are only 167MB, so potentially feasible to handle in the browser. Out of 400K words, 64K can be removed because they contain non-alphanumeric characters (+ dashes).
They're sorted by frequency, I wonder how many more it's safe to remove. Even half way through the file, almost all of the words look like garbage, but there are occasionally a few that seem useful.
Maybe I could cross-reference another dataset though there's no guarantee it would be any better, and it's not a very elegant solution.
My guess is that the number of dimensions in the vector will not matter much for my use case and I will be safe using 50d, but I should probably test this with some example words.
I could maybe go up to 100d if I found a way to remove non-dictionary words, but probably not any higher because the files get too big.
The Divergent Association Task (which I think is worse than worthless for its stated purpose...) uses GloVe, not word2vec.
It's still doing server side processing but it looks like different datasets are available. They used the Common Crawl dataset. It might be easier to make some use of this since the embeddings aren't in some specific binary format.
GloVe just needs basic statistics to use the vectors. Someone could have probably made a real js implementation if they figured out the word2vec binary format.
node-word2vec just calls the C implementation of word2vec. What's even the point of that.
The word2vec node package says it only works on Unix systems, so I guess I can't use it with webpack. The dataset Semantle uses is 1.6GB (and 3.5GB unzipped...) which I can't really load into the browser.
Maybe the way to go is to precompute values and limit the selection of words that can be used. Each player could have a selection of a few words to choose from, taken from a trimmed down dataset.
I feel that would defeat the point though, you're supposed to be able to think of anything. Maybe I can just remove uncommon words and it wouldn't be too noticeable.
I think even if I reduced the vocabulary size, I would still need to store the data as vectors and not precompute the actual similarity values since the vector representation probably reduces the number of dimensions a lot.
I had some ideas for word games based on semantic distance, it will be an interesting problem to see if I can do this performantly in the user's browser. I think Semantle precomputes data for each day's puzzle but I want it to calculate it dynamically.
It would be cool to make it multiplayer as well, but I've never tried to make a multiplayer game before, it's probably annoying. Maybe there could be local multiplayer or against the computer to begin with.
I've also been slowly working on a web-based text adventure puzzle game.
bored, made a simple noita mod
fixed the bug mentioned in #177, and added post tag :)
the hundred prisoner's problem is cool
describing the optimal strategy to solve one of the variants, but more generally applicable to the solution to the main problem:
"The success of the strategy is based on building a correlation between the successes and failures of the two players."
this feels like a sort of magic because you sacrifice any chance of success the doomed worlds had to make the other worlds a guaranteed success
my intuition is that this could be exploited somehow to create a cryptosystem that would be probabilistic but the chance of failure was infinitesimally small, but i'm not smart enough to figure out how
so i'm starting to think my intuition was wrong
but i don't think it would necessarily have to make use of the permutation property thing that's unique to the 100 prisoner's problem, maybe it could more generally make use of an probabilistic operation with some redundancy
i got sidetracked and looked up a bunch of stuff and actually i think they already made it
but i don't even understand it
the single post thing is still very broken actually x_x
inconsistency between default behaviour of linearGradients in firefox/chrome
guess i need to fix images too, or just change the URL to be /post/[id].html instead of /post/[id]/index.html
the former is more sovlful
and i should add [post] tag
for another time