2024: The Year of Infinite Infosec Fail

Posted on March 30, 2024 by tacit

First up in today’s game of “who fed it and who ate it:” Artificial Intelligence.

AI is everywhere. AI chatbots! AI image generators! And now, AI code assistants, that help developers write computer programs!

Only here’s the thing: AI doesn’t know anything. A lot of folks think these AI systems are, like, some sort of huge database of facts or something. They aren’t. They’re closer to supercharged versions of the autocomplete on your phone.

Which means if you ask an AI chatbot or code generator a question, it does the same thing autocomplete does: fills in syntactically correct words that are likely to come after the words you typed. There is no intelligence. There is no storehouse of facts it looks up.

That’s why AI is prone to “hallucinations”—completely imaginary false statements that the AI systems invent because the words it uses are somehow associated with the words you typed.

AI Fembot says: The Golden Gate Bridge was transported for the second time across Egypt in October of 2016. (Image: Xu Haiwei)

So, code generation.

AI code generation is uniformly terrible. If you’re asking for anything more than a simple shell script, what you get likely won’t even compile. But oh, it gets worse. So, so much worse.

AI code generators do not understand code. They merely produce output that resembles the text they were trained on. And sometimes, they hallucinate entire libraries or software packages that do not exist.

Which is perfectly understandable once you get how AI LLMs work.

What’s particularly interesting, though, is that malware writers can write malware, give it the same name as the packages AI code generators make up out of thin air, and devs will download and install them just because an AI chatbot told them to.

Bet you didn’t have that on your “Reasons 2024 Will Suck” bingo card.

And speaking of things that suck:

I woke this morning to a message from Eunice that a popular, trusted developer had inserted malicious code in an obscure Linux library he maintains, code that would allow him to log in and access any Linux system that his library is installed on.

In February, then again in March, the developer released updates to a library called “XZ Utils.” The update contained weird, obfuscated code—instructions that were deliberately written in a manner to conceal what they did—but because he was a trusted dev, people were just like 🤷‍♂️. “We don’t know what this code he added does, but he seems an okay guy. Let’s roll this into Linux.”

He seems a decent fellow. We don’t know what this code does, but what’s the harm? (Image: Zanyar Ibrahim)

Fortunately it was spotted quickly, befure it ended up widely used, so only a handful of bleeding-edge Linux distros were affected, but still:

What the actual, literal fuck, people??!

“This library contains obfuscated code whose purpose has been deliberately concealed. What’s the worst that can happen?”

Jesus. And it’s only March.

Developers should never be allowed near anything important ever.

Beware Bowdlerization of Google Docs

Posted on March 28, 2024 by tacit

Image: David Pennington

I write novels almost exclusively in Google Docs.

It’s an aggressively mediocre word processor with two killer features: you have access to it wherever and from whatever device you have Internet access, and it is hands-down the absolute best thing out there for collaborative writing. Nearly all my books are co-written with other people. Google Docs makes this effortless; in fact, many’s the time I’ve been working with Eunice or my Talespinner as both of us type in the same Docs file at the same time.

Even when we aren’t writing at the same time, Google Docs makes it easy for us to leave notes to each other within the same document. It’s no exaggeration to say Docs is probably the best thing to happen to collaborative writing since the invention of the fountain pen.

So you can imagine when I opened my Messenger app a couple days ago and found a message from my co-author Eunice linking to a story by a writer who’d lost access to Google Docs and her manuscript because they contained sexually explicit content.

I’ve spent the last couple of days poring over the Google Terms of Service, and what I found is…worrisome.

Many of the novels I write contain sex. Some of them contain a lot of sex; the Passionate Pantheon series Eunice and I write, a far-future post-scarcity science fiction series where residents of the City worship AI gods through highly ritualized group sex, is a vehicle for us to explore sexual ethics, philosophy, and society in a setting where attitudes toward sex and violence are pretty much exactly the opposite of what they are here in the real world. And these books have tons of sex, some of it so kinky the kinks don’t even have names—we looked.

Naturally, the notion that Google can terminate your Google account and delete your manuscripts in progress for (consensual adult) sexual content is a little alarming.

The issue seems to be Google’s March 2024 anti-spam update.

What does spam have to do with sex and Google? Glad you asked.

More and more often, I am seeing spam that directs to Google properties: Google Sites and Google Docs, mostly. The spammers link to a Google page, which has a link that goes on to the spam site.

Why? Because it keeps the spam emails from being filtered by anti-spam filters (Google links aren’t flagged as spam) and helps prevent the spammers from having their sites shut down.

Sex spammers especially seem to be flocking to Google:

If you click on the link, you’re taken to a Google Site (as in this example) or a Google Doc that then contains a link to the spam site. The Google page includes a little circle-I icon that, if you click on it, brings up the option to report the Google Site or Google Doc for abuse.

If you hit the Report Abuse link, one of the options is “Sexually Explicit.”

So. It seems Google doesn’t permit sexually explicit content. But is that actually part of the Google Terms of Service? Well, kinda.

Here’s the relevant part of the Google Terms of Service:

This…isn’t actually terribly clear. It forbids distributing sexually explicit material, though it doesn’t ban creating sexually explicit material, nor does it say what constitutes “distributing.”

So.

What follows is a completely unofficial speculation about what might be happening and what you might be able to do about it. I claim no insider knowledge of Google’s policies; this is simply informal noodling about the situation.

There are several ways to share a Google Doc. You can invite specific people to see it, and give them different levels of access (read only, comment, propose changes, edit, and so on). You can set it up so that anyone who has the URL can read the document, but can’t make any changes. The way you share it affects what people who view it will see.

If you invite specific people to be able to see and/or comment on the document, they will not see the little information bubble that gives them the option to report the site to Google’s abuse team.

If you set the document up so that anyone with the link can see it, which is what spammers do, then anyone who views the document will see the option to report the document for abuse.

I think—and let me emphasize again this is not based on insider knowledge of anything happening at Google—I think what’s happening is that authors who share Google docs with beta readers may be sharing it by setting the document up so that everyone who has the link can see the doc, and people are reporting the doc.

Why? Unknown. Maybe they’re undermining an author they personally don’t like. Maybe they’re just busybodies.

Point is, Google is a big company, with billions of files and docs on Google Sites and Google Docs and so forth, and they’re not generally proactive about deleting content that violates their terms. They’re reactive—they take action when someone calls attention specifically to a doc or file or page.

So it would seem that they consider sharing a read-only link to be “distribution,” and authors who “distribute” sexual content this way are prone to getting their stuff deleted.

If that’s true, what does it mean?

First of all, it suggests that sharing docs with sexual content to beta readers or reviewers is very dangerous. One person clicking that “report abuse” link may be all it takes to lose access to your Google Docs.

So if you’re sharing content with beta readers, especially beta readers you haven’t individually vetted, don’t do it by sharing a publicly-accessible link to any Google content. Create a Word file and share that, or host the copy you share on your own site…basically anything else.

But it also suggests that in the future, should they want to, Google can decide to be less reactive about enforcing their terms and simply search for sexual words or phrases. It would be trivial of them to do so. Their current terms forbid “distributing” sexual content, but of course they decide what distributing means, and they can change that whenever they feel.

The second thing it means is back up your Google content!

You can download from a Google doc to a Word file easily; it’s in the File menu in Docs.

Back up early. Back up often. (I’ve long had a policy of downloading Google Docs after every major change, because Google has been known to accidentally lose files, but this recent development has me doing so even more aggressively).

I plan to continue using Google Docs to write manuscripts. Thankfully, I don’t share the docs to dog+world, so I’m not likely at risk of having a malicious rando report me.

But I will continue to keep local copies of everything, and I’m in search of a replacement for Google if things should go pear-shaped.

Anyone out there who knows of any good collaborative writing tools, please shout out in the comments!

Donating Cycles, Making the World Better

Posted on February 6, 2024 by tacit

One of the things I generally try to do is leave the world in a slightly better state than I found it. Of course, I’m not always perfect at that, but on the whole I think it’s a good goal to shoot for.

To that end, I recently started participating in BOINC again.

If you haven’t heard of it, BOINC is a system where nonprofit science research teams can solve computationally complex problems without having to build or buy time on horrifically expensive supercomputers, by using all the spare idle computation time of ordinary people who leave their computers on even when they aren’t using them. BOINC detects when your computer is idle, and donates CPU cycles to researchers, basically making your computer part of an enormous ad-hoc supercomputer. You can choose what research projects you want to participate in.

Back when I lived in Canada, I joined BOINC and allowed them to use my laptop to look for new treatments for diseases by studying protein folding.

I dropped out of BOINC when I came back to the US from Canada, but I’ve just re-joined again.

This is my old 2012 laptop, which now does nothing but BOINC. I’ve joined two research projects, Rosetta@Home (which does research on protein folding to look for new drugs and disease treatments) and World Community Grid (which looks for genetic markers for cancer and searches for cures for diseases that are too uncommon or appear in parts of the world too impoverished to be worthwhile for conventional for-profit pharmaceutical companies).

I have a computer that is essentially a backup Time Machine server and Web server, and I may run BOINC on that as well.

I would encourage anyone out there who wants to help solve real problems by donating idle computer time to join.

Basically, you just install the BOINC software, choose a research project from a list, and that’s it.

BOINC stops running whenever you use your computer, so it won’t slow you down, but it means your computer time isn’t being wasted whenever your computer is turned on but you aren’t sitting in front of it.

fly.io, SMS spam, and malware

Posted on January 5, 2024 by tacit

[Edit 11-Jan-2023] I’ve received a reply from Fly.io; see end of this entry

Ah, a new year has come. Out with the old, in with the new…strategies for phish and malware sites, that is.

And what would phish and malware sites be without complicit webhosts and web service providers?

So today I’m going to dive into an enormous quantity of SMS text message spam I’ve been flooded with over the past couple of months, who’s behind it, and what it’s doing.

It started in mid-November of last ear (2023), with a text message saying “The USPS package arrived at the warehouse but could not be delivered” and a link to a site that was just a random collection of letters and numbers. No biggie, I get these all the time. Standard run of the mill phish attempt. If you visit the link, you’re taken to a site that looks like the Post Office, but it’s a fake, of course. They ask you to type a bunch of personal information, which the people responsible will use to steal your identity, get loans in oyur name, whatever.

Then I got another. And another. And another. And another. And then dozens more, coming in one, two, three, four, sometimes five or more a day.

And they haven’t stopped.

Text message after text message after text message. “You’ve been infected with viruses.” “Your cloud service has been terminated.” “We couldn’t deliver your package.”

All of them with URLs that looked like random strings of letters and numbers.

So my spidey sense was activated, and I looked up all those URLs.

Surprise, surprise, every single one is hosted on the same web service provider, an outfit called fly.io.

And there are a lot of them.

*** CAUTION *** CAUTION *** CAUTION ***
THESE LINKS ARE LIVE AS OF THE TIME OF WRITING THIS. Many of these links will bring you to malware or phish sites. DO NOT visit these links if you don’t know what you’re doing.

I started collecting the URLs from the text messages:

http://eonmpxm.com/OR73bg5L
FakeAV malware site
http://wkcetku.com/G1LO5X38
Fake “government subsidy” site
http://nztkspy.com/MK2RVeJg
FakeAV malware site
http://lkxsxef.com/KJeQ09Vp
FakeAV malware
http://klxnitq.com/oxp18G47
Equifax phish
http://epgguli.com/0M37VmkO
McAfee phish
http://yonxutn.com/1MZbOrZv
FedEx phish
http://zveeyou.com/7Xy1E8G8
FakeAV malware
http://mirumbf.com/KJeQ09Vp
FakeAV malware
http://mirumbf.com/KJeQ09Vp
FakeAV malware
http://qjkwmww.com/yng4eExR
Fake USPS phish
http://wnddwet.com/KJe40qm5
FakeAV malware
http://pdxftwt.com/ER39R0rR
XFinity phish
http://plefaas.com/rNzdEAEW
FakeAV malware
http://oitbaon.com/A3B6vBOe
FakeAV malware
http://napiyib.com/nQ0mJKoZ
FakeAV malware
http://kozqtlp.com/vGeO0XmX
Xfinity phish
http://ugokulc.com/KJM89Mem
USPS phish
http://iqbyojt.com/KJeQ09Vp
FakeAV malware
http://sobagiw.com/nQVA0bVp
Xfinity phish
http://oosjrjt.com/GRG8ML9n
FakeAV malware
http://xqzfnuh.com/ZjgL4GbE
Xfinity phish
http://tecvxzo.com/5aannZO7
Google phish

I notified fly.io’s abuse team about the problem. And notified them. And notified them. And notified them. Each time, I received an identical reply, from a guy calling himself “Matt Braun,” saying only “I have let our customer know. Thanks!”

Matt Braun doesn’t appear to have grasped that their customer is the phisher. And lately, I haven’t even received these replies; they haven’t acknowledged recent abuse reports in days. Meanwhile (of course) all the links remain active because (of course)…their customer is the phisher.

Okay, so how does the scheme work?

I’ve spent some time mapping out the network. The quick overview:

A text message is mass broadcast, advertising a URL on fly.io.
Marks who click on the link in the message are redirected to a site called “track.palersaid.com,” hosted on Amazon AWS. Track.palersaid.com looks at the incoming fly.io URL, the type of computer or smartphone you’re using, and probably other stuff, then sends you on to another site.
This site, track.hangzdark.com, is another tracking and redirection site also hosted on Amazon AWS.
From there, marks are redirected to the actual target site, which might be a fake FedEx page, a fake UPS page, a fake “virus scan” page, or more. There are a lot of these destinations: read.messagealert.com, kolakonages.com, aca.trustedplanfinder.com, and more. Some of these destination sites are, no surprise here, hosted on Namecheap, which is in my opinion one of the scuzziest of malware and spam sewer hosts.

Example destination page

How the network works

This bears a strong resemblance to some of the malware and spam networks I’ve mapped out in the past, though the delivery network (SMS text messages) and the web service provider (fly.io) are different.

If you get these text messages, do not follow the links. If you are also seeing these messages, please let me know in a comment! I would love to know how big this network is. Fly.io seems reluctant to shut down these phishers, which leads me to wonder if they aren’t making quite a bit of money from them.

[Edit 11-Jan-2023] I’ve received a reply from Fly.io’s Abuse team:

Thank you for your patience with us over the holiday, and some follow up details.

Usually, when we have reports of spammer or abuser on our platform, our internal systems have a host of signals that we can look to to verify the report and take the appropriate action. In the vast majority of cases the signals are clear and unequivocal. However, in this instance, the signals were entirely the opposite: all signs pointed to a seemingly-legitimate user.

Our systems are set up for “either you are a customer or you are not”, and banning a customer would mean immediate and irrevocable loss of that’s customers data. That’s is not a risk we take lightly so we were not going to flip the switch and risk blowing away someone’s information without a smoking gun. I expect you and I have both seen dozens of those posts on Hacker News or elsewhere where an innocent user writes “Company has deleted my entire account without warning and I’ve lost years of data”. We don’t want to do that to someone.

So where does that leave us? The apparent reason for the behavior/signal disconnect is that it was our customer’s customer doing the abuse. Our customer has committed to evicting their customer today which should put an end to the redirection through our systems (though, unfortunately, I don’t expect that’ll have any impact on the SMS spam). If it doesn’t resolve things, let us know. We’re back online after the holiday and more in a position to chase things things down.

Additionally, there were two other concerns we need to address internally:
1) We don’t have the ability to suspend users. This is something that I’m going to pursue as we need something more nuanced than our all-or-nothing approach so that we’re able to move on complaints sooner without risk of harming someone innocently caught in the middle of things.
2) We did not follow up with the customer as often as we should have after their initial acknowledgement of the problem and indication that they would address it. That’s a coordination process breakdown exacerbated by people taking time off during the holidays and not having the usual “obviously-abuse” signals. Additionally, we need to come up with an approach to our abuse ticketing system that allows for long-lived cases.

You can email me, personally, if you feel you aren’t getting attention on this (email redacted) and I’m sincerely sorry for the delay in letting you know where things stood or getting things sorted with the customer.

It seems Fly.io is one of the good guys.

The spam stopped for a few days, though it has resumed again. This time, the SMS spam domains are hosted on Alibaba rather than Fly.io.

A World of Sh*t

Posted on December 10, 2023 by tacit

I keep, on my phone, a list of books I want to write. There’s something wrong with it; somehow, every time I finish a book, I discover the list has grown longer, not shorter. (Side note: You can tell someone’s an amateur whaen they say “I don’t want to show my book to an editor or publisher because I’m afraid someone will steal my idea.” Nah bruh, ideas are worthless, and we all have too many ideas of our own to be interested in yours. The bitter truth of writing is it’s almost impossible to get anyone interested in your book in the first place!)

One of the books on the to-be-written list is a nonfiction work titled A World of Sh*t: Normalizing bad design and lazy craftsmanship. Because man, there’s a ton of it out there.

The way I imagine the book’s title

As I sit here in my parents’ house in Florida, I find myself particularly annoyed by the bad, lazy, incompetent, “we didn’t think this through” design around me.

There’s a term that describes a lot of this crap: “psychic litter.” The expression was coined by David Joiner in the 1990s, to describe small acts of immorality that fall beneath the threshold of conscious awareness.

Take, for example, the Windows installer. It takes a while to install Windows, especially older versions. A lot of that time is spent building the Registry. The Windows installer designers could have pre-built a Registry in the installer itself, which would save almost half an hour on each install, but chose not to because it would mean taking an extra half an hour of their time to build the installer. So rather than spending the half an hour on their end, they chose to waste thousands of man-hours of other people’s time.

This kind of selfishness and lack of care is the essential beating heart of a lot of sh*t design.

Take my parents’ kitchen faucet (please!).

It’s pretty. It’s sleek.

It doesn’t move.

You literally cannot rotate it between the two sinks, which is, you know, one of the most basic of all faucet functions. It doesn’t turn. At all. They have two sinks, but you can only use the faucet with one of them.

Worse, it’s also a sprayer; the entire faucet removes. Clever, except that it does not, and has never, docked correctly. It has a plastic ring on the faucet that fits a plastic sleeve on the base, but the ring is too large; it doesn’t fit. (I imagine the fact that it’s a sprayer is the reason it can’t rotate, and that would be absolutely perfect for a three-armed user.)

And then there’s this marvel of engineering:

This is the steering-wheel-mounted remote for the car stereo in my parents’ truck, a Toyota Tacoma.

Steering-wheel-mounted remotes for a car stereo are a brilliant idea. And they’re really not that complex. They move the most often-used functions to a place where you need not look away from the road or take your hands off the steering wheel to use them.

This control has four primary buttons: left, right, up, down. Now, thinking about what it’s supposed to do (work a CD player/Bluetooth combo), you might reasonably expect that left and right go to previous and next track, and up and down raise and lower the volume.

And you’d be 100% wrong.

Left skips back 10 seconds in the current track. (Yes, seriously.) Right skips forward 10 seconds. Up goes to the next track, down goes to the previous track.

What about volume? How do you adjust the volume?

You don’t. There are no volume controls on the steering wheel. To change the volume, you have to take your hands off the steering wheel.

Yes, you read that right. They literally believed that forward 10 seconds/back 10 seconds was so important it should be on the steering wheel, but volume? Eh. Who uses the volume controls, anyway?

Every single digital music player I’ve ever used, from the Radio Shack Compact Disc Player CD-1000 my parents got in 1984 to my iPhone today, uses left and right arrows for previous and next tracks. But whatever Toyota intern who designed the car stereo controls, having apparently never used or indeed seen an entertainment sound system before, had his own ideas, and somehow, somehow it passed all the design review steps. Somehow, someone signed off on manufacture.

Skip ahead ten seconds yes, volume control no.

And here’s the thing:

The world we live in today, our world of marvels and miracles, is filled with examples like this.

It’s hard not to believe that the vast majority of industrial designers are anything but lazy and barely competent, unwilling or unable to put any effort into their job (and it certainly feels like they never use the things they design). From consumer electronics to furniture to software to clothing, we live in a universe of shit.

My jacket has a zipper edged by a hem that is exactly the right width to catch the slider as it moves. It is not possible to zip or unzil the jacket without the hem catching the slider at least three times.

Someone designed that. It went through several review steps before it was released to manufacture. And yet, neither the designer nor any of the peple resonsible for reviewing the design ever put the jacket on. (I’m serious when I say you cannot zip or unzip it without catching the slider. Even one test would’ve been enough.)

We live, we exist in a world of sh*t. We don’t pay attention to the way design impacts our lives, and as a result, trivial design failures—failures that can easily be corrected in minutes during the design stage—waste countless person-years of time. In some cases, like car stereos with cluttered or counterintuitive layouts, they kill people.

And we as a society are remarkably okay with that.

I’m not sure what changed, but in the last five years or so, I’ve found it increasingly difficult not to notice shitty design all around me. And once you’ve started to see it, it snowballs. You can’t un-see it.

I would like to live in a world where perhaps people cared about design more. But the problem seems to be getting worse, not better.

The cost of your cat pictures

Posted on November 20, 2022 by tacit

Every month, almost three billion people use Facebook.

Those people upload photos and video and it all gets saved—about 4 petabytes, four billion gigabytes, of data every day.

Those are abstract numbers. What does it mean? How Does Facebook not run out of space?

Exactly how you think. They buy more than 1,000 hard drives every day. (As of the time I write this, the information I can find suggests they prefer to use 4TB hard drives rather than larger drives for cost and reliability reasons.)

This is a pallet of 180 hard drives:

Facebook adds the equivalent of about 6 of these pallets of hard drives to its storage hive every day. They’re placed in server computers in Facebook’s Hive data store that have 12 hard drives per server, so they’re adding data equivalent to at least 83 servers per day. (That’s only for storing user generated data like photos, and does not include extra drives for RAID redundancy or data duplication, which I imagine likely doubles that amount.)

Here’s the inside of one of Facebook’s data centers.

Imagine building after building, row after row of these. Now imagine 6 pallets of hard drives coming in on trucks and 83 servers’ worth of storage being added today.

And again tomorrow.

And again the day after tomorrow.

And again after that.

And yes, they really do order hard drives by the truckload.

This is why any time some conservative tells you “BuT fAcEbOk iS vIoLaTiNg My FrEeDuMb Of SpEeCh SoCiAl MeDiA sItEs ArE pUbLiC sPaCeS DuRr DuRr,” you can laugh in their face and walk away.

See all those servers? See all those buildings? See all those pallets of hard drives being trucked in? See all those people installing them?

Are you paying for them? No. Is the government paying for them? No. Is public money paying for them? No. They are private property. Billions of dollars of private property.

Facebook spends, as a first order approximation, about $30,000,000,000 a year on server infrastructure, not including buildings, land, facilities maintenance, installation, or salaries.

Anyone who thinks that social media sites are “public spaces” is welcome to propose that Congress gives Facebook $30,000,000,000 a year to keep up that infrastructure. Otherwise, no, it’s not. That’s $30,000,000,000 a year in private money being used to buy private property.

Okay, so.

You can’t have a service where almost three billion people communicate without having tremendous political clout. Facebook can, and arguably has, influenced elections and changed the course of nations.

And that’s (rightly, I think) got a lot of people worried. When you have a private company with no public accountability that has that much influence, that’s a bad thing, right?

Well, yes.

But here’s the thing: This isn’t new.

People forget this isn’t new. It’s always been this way. In the 1700s and 1800s, elections were decided by newspaper barons.

Remember William Randolph Hearst? Remember the Spanish American War? That was a war basically started by one man, a newspaper mogul, who totally dominated public political discourse and established a whole new world of journalistic propaganda.

This is probably the most effective fake news in history.

So what’s different?

Ah, now that’s a question.

Modern social media is different from the media empires of old in one important way: they are participatory, many-to-many, not one-to-many. In the past, “media” meant the owner disseminated information to content consumers. Today, we are all content creators and content consumers.

And this has led to a great deal of confusion betwixt “public” and “private.”

The Internet allows anyone to use it, but few people actually know how it works, or what scale it operates on. Hundreds of companies spend billions per year on the infrastructure to give everyone a way to communicate with everyone else, so what feels like a public square is actually a private space. And that leads to confusion: “Facebook banned me! My CoNsTiTuTiOnAl RiGhTs!“…when in fact you have no right to use other people’s stuff for free at all.

And make no mistake, that’s what Facebook and Twitter and all those other sites are: other people’s stuff. Billions and billions of dollars of other people’s stuff, that you’re using for free.

In the past, this confusion didn’t exist. In the past, nobody felt they had the right to someone else’s newspaper. You could write a letter to the editor, which they might or might not print, but nobody (well, nobody serious, anyway) had the notion that they had the Constitutional right to use someone else’s newspaper to say whatever they want.

We understand when something belongs to someone else, right up until the moment we’re allowed to use it ourselves…at which point we tend to assume an entitlement to it.

Owners of of media distribution companies have always had an outsized impact on social media. This isn’t new.

What’s new is that people are more aware of it, and want more of a voice. What’s unfortunate is that so many people aren’t going about it the right way. You don’t have a right to use Facebook, and if you’re kicked off you aren’t being “censored.”

What we need is entirely different conversation, and that’s one we can’t have whilst everyone is looking at the wrong thing.

Some Thoughts on Ethics in Computer Science

Posted on September 27, 2022 by tacit

As I type this, Elon Musk says the release of true full self-driving cars is perhaps months away. Leaving aside his…notable overoptimism, this would arguably be among the crowning achievement of computer science so far.

Yes, even more than going to the moon. Going to the moon required remarkably little in the way of computer science—orbital mechanics are complicated, sure, but not that complicated, and there are few unexpected pedestrians twixt hither and yon.

We are moving into a world utterly dominated by computers. And yet…there’s far too little attention, I think, paid to the ethical implications of that.

Probably the biggest ethical issue I see in computer science right now concerns training sets for machine learning.

I don’t mean machine learning like self-driving cars, though that’s a monstrous problem of its own—the thing about ML systems is they’re basically black boxes that most definitively do NOT see the world the way we do, so they can become confused by adversarial inputs, like this strange sticker that makes a Tesla see a stop sign as a Speed Limit 45 sign:

And of course ownership of these enormous ML systems opens whole cans, plural, of worms just by itself. If you use terabytes of public domain data to train a proprietary ML system, what responsibility do you have to make it available to the people who produced your training data, and what liability do you have when your system goes wrong?

And of course deepfake software can produce photos and video of you in a place you’ve never been hanging out with people you don’t know saying things you’ve never said, and make it all totally believable. Look for that to cause trouble soon.

But those are just fringe problems, minor ethical quibbles compared to the elephant in the room: bias in data sets.

Legal societies are putting more and more reliance on ML systems. Police use machine learning for facial recognition (and now, gait recognition, recognizing people by the way they walk instead of their facial features—yes that’s a thing).

Militaries use facial recognition in weapons platforms. Right now they don’t rely in it, but use it as an adjunct to more traditional intelligence, but the day is coming when foreign military targets will be designated entirely by things like facial recognition systems.

Problem is, those ML systems tend to be racist AF.

I’m not kidding.

ML systems rely on training with positively gargantuan training data sets. You train a machine to recognize faces by feeding it millions—or, if you can, tens or hundreds of millions—of pictures of people’s faces.

The researchers who define and build these systems tend to turn to the Internet for their training data, trolling Internet social media sites with bot software that hoovers up all the photos it can find for training data.

https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921

Let’s ignore for a moment the copyright implications. Let’s ignore the ethical issues of treating people’s property as fodder for surveillance systems. Let’s just quietly ignore all those ethical problems so we can talk about racism.

People who post lost of photos on the Internet that get scooped up to train ML systems aren’t an even demographic slice of humanity. They tend to have more money than average, be Western Europeans or North Americans, and tend to be white.

That means ML systems to do things like facial recognition are trained on data sets that are overwhelmingly…white faces.

And that means these systems—even commercial systems now deployed and used by police—suck at identifying black or Asian faces.

Like sometimes really suck.

Like, as of 2018, commercial facial recognition systems had a recognition failure rate on white male faces of 0.2%, and a failure rate on black female faces of more than 22%.

If that isn’t ringing alarm bells in your head, you’re not fully comprehending the magnitude of the problem.

Here’s a rather frightening quote:

Last year, the American Civil Liberties Union (ACLU) found that Amazon’s Rekognition software wrongly identified 28 members of Congress as people who had previously been arrested. It disproportionately misidentified African-Americans and Latinos.

https://www.theguardian.com/technology/2019/jul/29/what-is-facial-recognition-and-how-sinister-is-it

We train AI and ML systems with data from the real world…and we get deeply racist, profoundly flawed AI and ML systems as a result.

Then we use these AI and ML systems to identify people for arrest or to send autonomous suicide drones.

https://reason.com/2021/06/01/autonomous-slaughterbot-drones-reportedly-attack-libyans-using-facial-recognition-tech/

Any machine learning system is only as good as the training data it’s developed with. And when that training data is a snapshot of the Internet, well…

This blog post was adapted from an answer I wrote on Quora. Want more like this? Follow me on Quora!

Hacking as a tool of social disapproval

Posted on June 23, 2022 by tacit

“The street finds its own uses for things.” —William Gibson, Burning Chrome

Last year, my wife, my co-author, and I launched a new podcast, The Skeptical Pervert. We talk about sex…and more specifically, we talk about sex through a lens of empiricism and rationality.

The Skeptical Pervert’s website runs WordPress. Now, I’ve been around the block a few times when it comes to web security, and I know WordPress tends to be a rather appetizing target for miscreants, so I run hardened WordPress installs, with security plugins, firewalls that are trained on common WordPress attack vectors, and other mitigations I don’t talk about openly.

I run quite a few WordPress installs. My blogs on franklinveaux.com and morethantwo.com run WordPress. So does the Passionate Pantheon blog, where Eunice and I discuss the philosophy of sex in a far-future, post-scarcity society. In addition, I host WordPress blogs for friends, and no, I won’t tell you who they are, for reasons that will soon become clear.

I automatically log hack attacks, including failed login attempts, known WordPress exploits, and malicious scans. I run software that emails me daily and weekly statistics on attacks against all the WordPress sites I own or host. I also subscribe to WordPress-specific infosec mailing lists, so I am aware of the general threat background.

Because WordPress is such a common target—it’s the Microsoft Windows of the self-hosted blog world, with everything that implies—any WordPress site will get a certain low level of constant probes and hack attempts. It’s just part of the background noise of the Internet. (If you run WordPress and you’re not religiously on top of security updates, by the way, you’ve already been pwn3d. I can pretty much guarantee it.)

The fact that I host WordPress sites not connected with me to the outside world gives me a good general baseline reading of this background noise, that I can use to compare to hack attacks against sites that are publicly connected with me.

And the results…well.

In all the years I’ve been on the Web—and I started running my own Web sites in the mid-1990s—I have never seen anything even remotely close to the constant, nonstop barrage of attacks against the Skeptical Pervert site. Joreth and Eunice are probably quite sick of my frequent updates: “Well, the firewall shows over a thousand brute-force hack attempts against the Skeptical Pervert site so far today and it isn’t even noon yet” (seriously, that’s a thing that happened recently).

Here’s a graph showing what I mean. This graph covers one week, from June 13, 2022 to June 20, 2022. The “baseline” in the graph is an average of several WordPress sites I host that aren’t in any way connected to me in the eyes of the Internet at large—I don’t run them, I don’t put content on them, my name isn’t on them, I merely host them.

Note that the attacks don’t scale with traffic; the More Than Two blog has the most traffic, followed by franklinveaux.com, then the Passionate Pantheon blog, then the Skeptical Pervert.

So what to make of this?

Part of it is likely the long-running social media campaign my ex has been running. Attacks on franklinveaux.com and morethantwo.com increased in the wake of her social media posts.

But that doesn’t explain what’s happening with the Skeptical Pervert, which has turned out to be targeted to an extraordinary degree.

Now, I don’t know who’s attacking the site, or why, so this is speculation. It’s hard to escape the idea, though, that when a site and podcast explicitly about sex, co-hosted by two women of color, talking about non-traditional sexual relationships is targeted, at least part of the answer might simply be the same old, same old tired sex-negative misogyny and racism we see…well, everywhere, pretty much. The fact that my ex doesn’t like me (and will say or do anything to get other people not to like me) doesn’t explain what’s happening here.

It’s easy to blame conservative traditionalists, but Eunice reminded me there’s another factor at work as well. The Skeptical Pervert approaches sexuality from a rational, evidence-based, skeptical lens. In the United States, there’s a stubborn streak of misogyny amongst the dudebros of the skeptics community. A podcast with two women that looks at sex from a highly female-focused, feminist point of view taking on the mantle of skepticism? It’s possible there are dudebros who will perceive that as an encroachment into their space.

In short, I don’t think this is about me. I think this is about women talking openly about real-world non-traditional sex, and getting the same pushback that women always get when they dare to do that.

If the podcast were just me, or me with obviously male co-hosts, I don’t think the level of Web attacks would be anywhere near the same.

The street finds its own uses for things. In the hands of people threatened by or frightened of non-traditional voices, the Internet has become a safe, anonymous tool of harassment.

How Facebook convinced me democracy is in trouble

Posted on December 30, 2021 by tacit

Today, in The Street Finds its Own Uses for Things:

I noticed something funny when I logged into Facebook last week. My feed, which is normally filled with ads for video games, photography gear, and complicated kits for Stirling engines you can build at home, was absolutely jam-packed with ads for far-right pro-Trump merchandise, antigovernment T-shirts and posters, gun holsters, and “conservative news” sites.

And I mean jam-packed. I’ve never seen this quantity of advertising on Facebook before; literally an ad following every single friend post.

The whole secret of advertising on Facebook is you can target your ads. You can specify exactly who you want to see your ads; for example, when we ran ads for the first porn novel we co-authored, Eunice and I targeted people with an interest in reading who were 35 or younger and lived close to a university, figuring this would likely be the sort of person interested in far-future, post-scarcity science fiction smut.

So why would Facebook, that giant creepy Hydra in the cloud, show me alt-right ads when it knows I’m a lefty Portlander?

Because the advertisers know I won’t buy their products. They don’t care. That isn’t why they’re spending tens of millions of dollars on Facebook advertising.

So first, the ads.

I’ve gotten in the habit of aggressively blocking these ads when they appear, and blocking the companies that place them. Doesn’t matter. There are a zillion other companies placing near-ident0cal ads for near-identical products, and sometimes (this is a telling bit) even with the same stock photos.

The ads look lik e this:

If you ask Facebook “why did I see this ad?”, Facebook will show you the demographic the ad was targeting. And these ads are completely ignoring the laser-focused demographics Facebook likes to brag about. They’re shotguns, not sniper rifles.

So why? What’s the point? Why target so broadly, when it increases your spend without generating sales?

So here’s the thing:

I don’t believe they’re trying to generate sales.

That’s not the point. They aren’t interested in selling you gun holsters or T-shirts. I mean, if you buy some, that’s a bonus, but I believe these ads are a propaganda effort. The purpose is to put right-wing slogans and ideas in front of as many eyeballs as possible. They’re advertising ideas, not T-shirts.

The American political right is very, very good at propaganda. Liberals sneer at “Let’s Go Brandon,” the right-wing oh-so-clever “fuck Joe Biden,” but the thing is, it works. The people who use it don’t care that it’s juvenile. It makes them feel part of something. It’s a tribal identity marker.

And human beings like feeling like part of a tribe.

The hoodie up there that says “Proud member of the LGBFJB” community? It means “Let’s Go Brandon Fuck Joe Biden.” VClever? Not really. A great identity brand for a certain kind of person? Oh yeah.

And this brand is everywhere.

Branding and marketing and propaganda matter in political discourse. Arguably they matter more than policies and proposals and all that other wonk stuff.

They want this branding everywhere, and they’re willing to pay to make that happen.

People don’t make rational decisions. People make emotional decisions and then rationalize them. Often, those emotional decisions are predicated on feelings of belonging and inclusion. The right gets that, in its creepy way. The left? Not so much.

The thing is, the political left is doing nothing to counter any of this.

Do I think this Facebook propaganda is working?

Yes. Yes, I do.

It creates the illusion that right-wing ideas are more popular than they really are. It paints a false picture of what Americal looks like and what Americans want. It lets the right dominate the discourse in ways that the left won’t even try to counter.

The modern American right is intellectually and morally bankrupt, a seething cesspool of reactionary hate. But they get propaganda. They get it on an instinctive level, in ways that confuse lefties.

And that makes them far more effective than their numbers and policies alone would suggest.

I for one welcome our new AI overlords

Posted on December 11, 2021 by tacit

I’ve been thinking a lot about machine learning lately. Take a look at these images:

These people do not exist. They’re generated by a neural net program at thispersondoesnotexist.com, a site that uses Nvidia’s StyleGAN to generate images of faces.

StyleGAN is a generative adversarial network, a neural network that was trained on hundreds of thousands of photos of faces. The network generated images of faces, which were compared with existing photos by another part of the same program (the “adversarial” part). If the matches looked good, those parts of the network were strengthened; if not, they were weakened. And so, over many iterations, its ability to create faces grew.

If you look closely at these faces, there’s something a little…off about them. They don’t look quiiiiite right, especially where clothing is concerned (look at the shoulder of the man in the upper left).

Still, that doesn’t prevent people from using fake images like these for political purposes. The “Hunter Biden story” was “broken” by a “security researcher” who does not exist, using a photo from This Person Does Not Exist, for example.

There are ways you can spot StyleGAN generated faces. For example, the people at This Person Does Not Exist found that the eyes tended to look weird, detached from the faces, so the researchers fixed the problem in a brute-force but clever way: they trained the Style GAN to put the eyes in the same place on every face, regardless of which way it was turned. Faces generated at TPDNE always have the major features in the same place: eyes the same distance apart, nose in the same place, and so on.

StyleGAN can also generate other types of images, as you can see on This Waifu Does Not Exist:

Okay, so what happens if you train a GAN on images that aren’t faces?

That turns out to be a lot harder. The real trick there is tagging the images, so the GAN knows what it’s looking at. That way you can, for example, teach it to give you a building when you ask it for a building, a face when you ask it for a face, and a cat when you ask it for a cat.

And that’s exactly what the folks at WOMBO have done. The WOMBO Dream app generates random images from any words or phrases you give it.

And I do mean “any” words or phrases.

It can generate cityscapes:

Buildings:

Landscapes:

Scenes:

Body horror:

Abstract ideas:

On and on, endless varieties of images…I can play with it for hours (and I have!).

And believe me when I say it can generate images for anything you can think of. I’ve tried to throw things at it to stump it, and it’s always produced something that looks in some way related to whatever I’ve tossed its way.

War on Christmas? It’s got you covered:

I’ve even tried “Father Christmas encased in Giger sex tentacle:”

Not a bad effort, all things considered.

But here’s the thing:

If you look at these images, they’re all emotionally evocative; they all seem to get the essence of what you’re aiming at, but they lack detail. The parts don’t always fit together right. “Dream” is a good name: the images the GAN produces are hazy, dreamlike, insubstantial, without focus or particular features. The GAN clearly does not understand anything it creates.

And still, if artist twenty years ago had developed this particular style the old-fashioned way, I have no doubt that he or she or they would have become very popular indeed. AI is catching up to human capability in domains we have long thought required some spark of human essence, and doing it scary fast.

I’ve been chewing on what makes WOMBO Dream images so evocative. Is it simply promiscuous pattern recognition? The AI creating novel patterns we’ve never seen before by chewing up and spitting out fragments of things it doesn’t understand, causing us to dig for meaning where there isn’t any?

Given how fast generative machine learning programs are progressing, I am confident I will live to see AI-generated art that is as good as anything a human can do. And yet, I still don’t think the machines that create it will have any understanding of what they’re creating.

I’m not sure how I feel about that.

Franklin Veaux's Journal

Working toward the heat death of the Universe, one joule at a time!

Tag Archives: tech

2024: The Year of Infinite Infosec Fail

Like this:

Beware Bowdlerization of Google Docs

Like this:

Donating Cycles, Making the World Better

Like this:

fly.io, SMS spam, and malware

Like this:

A World of Sh*t

Like this:

The cost of your cat pictures

Like this:

Some Thoughts on Ethics in Computer Science

Like this:

Hacking as a tool of social disapproval

Like this:

How Facebook convinced me democracy is in trouble

Like this:

I for one welcome our new AI overlords

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: