That's definitely not what they're most useful for. I mean, you probably can use a bloom filter for implementing spell check, but saying that's where they're most useful severely misses the point of probabilistic set membership queries.
Bloom filters and their relatives are great when you have a huge set of values – eg. 100s of millions of user IDs in some database – and you want to have a very fast way of checking whether some value might be in that set, without having to query the database. Naturally this assumes that you've prepopulated a bloom filter with whatever values you need to be checking.
If the result of the bloom filter query is "nope", you know that the value's definitely not in the set, but if the result is "maybe" then you can go ahead and double-check by querying the database. This means that the vast majority of checks don't have to hit that slow DB at all, and even though you'll get some false positives this'll still be much much much faster than having to go through that DB every time.
If you meant my user ID example, you'd prepopulate the bloom filter with existing user IDs on eg. service startup or whatever, and then update the filter every time a new user ID is added – keeping in mind that the false positive rate will grow as more are added, and that at some point you may need to create a new filter with a bigger backing bit array
Well, yes and no. With a straight-up hash set, you're keeping set_size * bits_per_element bits plus whatever the overhead of the hash table is in memory, which might not be tenable for very large sets, but with a Bloom filter that has eg. ~1% false positive rate and an ideal k parameter (number of hash functions, see eg. the Bloom filter wiki article) you're only keeping ~10 bits per element completely regardless of element size because they don't store the elements themselves or even their full hashes – they only tell you whether some element is probably in the set or not, but you can't eg. enumerate the elements in the set. As an example of memory usage, a Bloom filter that has a false positive rate of ~1% for 500 million elements would need 571 MiB (noting that although the size of the filter doesn't grow when you insert elements, the false positive rate goes up once you go past that 500 million element count.)
Lookup and insertion time complexity for a Bloom filter is O(k) where k is the parameter I mentioned and a constant – ie. effectively O(1).
Probabilistic set membership queries are mainly useful when you're dealing with ginormous sets of elements that you can't shove into a regular in-memory hash set. A good example in the wiki article is CDN cache filtering:
Nearly three-quarters of the URLs accessed from a typical web cache are "one-hit-wonders" that are accessed by users only once and never again. It is clearly wasteful of disk resources to store one-hit-wonders in a web cache, since they will never be accessed again. To prevent caching one-hit-wonders, a Bloom filter is used to keep track of all URLs that are accessed by users. A web object is cached only when it has been accessed at least once before, i.e., the object is cached on its second request.
Ukraine’s troops withdrew from several areas of the country’s north-east amid mounting pressure from a new Russian offensive, as the president, Volodymyr Zelenskiy, postponed all foreign trips underscoring the seriousness of the threat....
It's like world leaders want Russia to win; making a show of donating insignificant amounts of old weapons systems they had lying around, but not actually doing enough to ensure Ukraine's victory.
Right, and which part of that makes saying "the problem isn't just the US" untrue? If anything that underlines the fact that everybody should be chipping in more
Musk, the employees said, was not pleased with Tinucci’s presentation and wanted more layoffs. When she balked, saying deeper cuts would undermine charging-business fundamentals, he responded by firing her and her entire 500-member team.
The dude's a petulant child. No wonder conservatives fawn over him.
With the latest version of Firefox for U.S. desktop users, we’re introducing a new way to measure search activity broken down into high level categories. This measure is not linked with specific individuals and is further anonymized using a technology called OHTTP to ensure it can’t be connected with user IP addresses....
You wouldn't be an asshole otherwise. Maybe some beautiful day you'll realise being an insufferable twat might not be the best approach to life, but I'm not going to hold my breath.
This reeks of someone who uses the word "woke" unironically
Edit: I couldn't help my curiosity. Turns out not only do they use the word "woke" unironically, they seem to think that grown men dating teenagers is A-OK, because of course they do:
As someone who actually does have chronic diarrhea and shat their pants just today (yay autoimmune diseases), I agree. "I wouldn't wish this on my worst enemy" can get fucked, I absolutely do wish this shit (🥁) on them.
At least here in the EU the ePrivacy directive and to a lesser extent the GDPR generally require that cookies have a limited lifetime depending on their function, to eg. prevent companies just attaching a stable identifier to every random passerby essentially forever. @Sunny, if you're feeling particularly mildly infuriated you could email the German Data Protection Authority, there's a good chance the cookie could attract the Eye of Sauron
The directive itself is kind of involved because it goes pretty deep into what its aim is and eg. what sort of information can be considers an identifier, and it's actually quite well argued and worth a read if that sort of thing is your, er, thing: https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX%3A32002L0058 (you need to scoll aaalll the way down to be able to show the body text). I had to deal with this stuff professionally when I was a CTO for a company with some stricter than average privacy requirements due to the field, and I was pleasantly surprised to find out how much sense ePrivacy and GDPR actually make
A Georgia business owner who bragged that he “fed” a police officer to a mob of rioters storming the U.S. Capitol on Jan. 6, 2021, was sentenced on Thursday to nearly five years in prison for his repeated attacks on law enforcement during the insurrection....
“Ethnically cleanse,” he said at one point, summing up his idea for a city purged of Blues (this, he says, will prevent Blues from ethnically cleansing the Grays first).
Conservatives are incredibly fucked up. They can't fathom coexisting with people who aren't like them without wanting to "ethnically cleanse" them, so they naturally assume everybody else thinks like this as well
It's great that this article linked to the original journal article. Nice that it's open access, too! So good to see that it's becoming more common. The academic publishing business is just so… well, in a word, fucked.
I've run into this problem with many open source projects. It's sometimes really hard to find out what the hell something actually does based on just the project's own pages. It took a while for eg. join-lemmy.org to actually describe what Lemmy is, for example, instead of just going on about it being open source and secure and federated and blah.
Yeah I really don't know where hostility against newbies (actual or perceived) comes from in nerd circles. It's been like this for as long as I can remember, and I've been eg. using Linux from the late 90's and fucking around on the Internet for over 30 years now. At least things are way better than they used to be, but it's still sometimes a bit of a bumpy ride
This is a shitty idea because anyone driving under the influence deserves to be tried and punished for their actions. But if this could be done, you’d have no trouble selling each pill for $1000....
NonCredibleFetish ( sopuli.xyz )
The biggest problems in the vast universe [Elder Cactus] ( lemmy.world )
Okay, yeah, I see now how it can be misinterpreted when taken out-of-context ( files.catbox.moe )
Based on a true story:...
xkcd #2934: Bloom Filter ( imgs.xkcd.com )
https://xkcd.com/2934...
Billionaires urged NYC mayor to sic police on Columbia encampment ( www.theverge.com )
Arizona officials say they can’t find Rudy Giuliani to serve him with indictment notice ( www.cnn.com )
Ukraine’s troops withdraw from parts of north-east as pressure mounts ( www.theguardian.com )
Ukraine’s troops withdrew from several areas of the country’s north-east amid mounting pressure from a new Russian offensive, as the president, Volodymyr Zelenskiy, postponed all foreign trips underscoring the seriousness of the threat....
The inside story of Elon Musk’s mass firings of Tesla Supercharger staff ( finance.yahoo.com )
Archive.org link...
Firefox version 126 introduces search data telemetry collection and enhanced copy without site tracking option ( blog.mozilla.org )
With the latest version of Firefox for U.S. desktop users, we’re introducing a new way to measure search activity broken down into high level categories. This measure is not linked with specific individuals and is further anonymized using a technology called OHTTP to ensure it can’t be connected with user IP addresses....
Slovakia’s prime minister Robert Fico ‘in life-threatening condition’ after shooting – Europe live ( www.theguardian.com )
Look buddy ( sopuli.xyz )
oh… ( sopuli.xyz )
I don't need any that in my silly little life at all ( sopuli.xyz )
UK toddler has hearing restored in world first gene therapy trial ( www.theguardian.com )
U.S. soldier detained in Russia and accused of stealing, officials say ( www.nbcnews.com )
The soldier, who was stationed in South Korea, traveled to Russia on his own to visit a woman he was romantically involved with, officials said....
Trump VP hopeful Kristi Noem suggests Biden's dog Commander should also be put down ( www.cnbc.com )
KEY POINTS...
The Duration Time on this Cookie... ( slrpnk.net )
Remember to use ad blockers and DNS filters ladies and gentlemen!...
dog dog dog ( sopuli.xyz )
Like getting 9 women pregnant and expecting a baby in 1 month ( sh.itjust.works )
Seal ions ( sopuli.xyz )
Man who bragged that he 'fed' an officer to the mob of Capitol rioters gets nearly 5 years in prison ( apnews.com )
A Georgia business owner who bragged that he “fed” a police officer to a mob of rioters storming the U.S. Capitol on Jan. 6, 2021, was sentenced on Thursday to nearly five years in prison for his repeated attacks on law enforcement during the insurrection....
It definitely *was* a good idea though ( sopuli.xyz )
"His name is Mongo" ( sopuli.xyz )
Microsoft and IBM make MS-DOS 4.00 Open-Source ( alternativeto.net )
The Tech Baron Seeking to “Ethnically Cleanse” San Francisco ( newrepublic.com )
cross-posted from: https://lemmy.ml/post/14962209...
Now all we need is a drink pairing guide ( sopuli.xyz )
The bird that came back from the dead by evolving twice [LiveScience] ( www.livescience.com )
It's great that this article linked to the original journal article. Nice that it's open access, too! So good to see that it's becoming more common. The academic publishing business is just so… well, in a word, fucked.
Our Response to Hashicorp's Cease and Desist Letter | OpenTofu ( opentofu.org )
https://feddit.nu/pictrs/image/c2330506-c03b-4012-bed5-873302f291a4.png...
Hobbes ( sopuli.xyz )
What do you think? ( sopuli.xyz )
Author: www.instagram.com/lucasturnbloom, www.patreon.com/HowToCat
Sample gut flora of people with "auto-brewery syndrome", culture the samples, and sell them in pill form to DUI lawyers for their clients.
This is a shitty idea because anyone driving under the influence deserves to be tried and punished for their actions. But if this could be done, you’d have no trouble selling each pill for $1000....