Call to the Lazyweb: Backup

I have a problem I’ve been beating my head against for a while now, and I’ve finally given up and decided to put this out there to the hive-mind of the Internet.

I have a laptop I want to keep regularly backed up. I have external hard drives that I use to do this, one that I carry with me and one that stays in my office in Portland. I use cloning software to duplicate the contents of the laptop onto them.

But I also want to do incremental backups, Dropbox-style, to a server I own.

I do have a paid Dropbox account and I do use it. (I also have a paid Microsoft OneDrive account.) But I’d really prefer to keep my files on my own server. What I want is very simple: the file and directory structure on the laptop to be mirrored automatically on my server, like such:

This should not be difficult. There is software that should be able to do this.

What I have tried:

Owncloud. They no longer support Mac OS X. Apparently they ran into problems supporting Unicode filenames and never solved it, so their solution was to drop OS X support.

BitTorrent Sync. This program is laughably bad. It works fine, if you’re only syncing a handful of files. I want to protect about 216,000 files, totaling a bit over 23 GB in size. BT Sync is strictly amateur-hour; it chokes at about 100,000 files and sits there indexing forever. I’ve looked at the BT Sync forums; they’re filled with people who have the same complaint. It’s not ready for prime time.

Crashplan. Crashplan encrypts all files and stores them in a proprietary format; it does not replicate the file and folder structure of the client on the server. I’m using it now but I don’t like that.

rsync. It’s slow and has a lot of problems with hundreds of thousands of files. The server is also on a dynamic IP address, and rsync has no way to resolve the address of the server when it changes.

Time Machine Server. Like CrashPlan, it keeps data in a proprietary format; it doesn’t simply replicate the existing file/folder structure, which is all I want. Like rsync, it has no way to cope with changes to the server’s IP address.

So you tell me, O Internets. What am I missing? What exists out there that will do what I want?

WordPress security issues: this is a bad one, folks

It’s been a bad week for WordPress. If you’re a WordPress user, I highly recommend you check as soon as possible to ensure your site is updated, all your plugins are up to date, and your site is free of unexpected users and malicious combat.

WordPress 4.4.2 was released February 2. This release fixes two known security flaws.

Hot on the heels of this security release come two worrying developments. The first, reported on over at the Wordfence blog, concerns a new WordPress attack platform that makes it easier than ever for criminals to attack WordPress sites. From the article:

The attack platform once fully installed provides an attacker with 43 attack tools they can then download, also from pastebin, with a single click. The functionality these tools provide includes:

  • Complete attack shells that let attackers manage the filesystem, access the database through a well designed SQL client, view system information, mass infect the system, DoS other systems, find and infect all CMS’s, view and manage user accounts both on CMS’s and the local operating system and much more.
  • An FTP brute force attack tool
  • A Facebook brute force attacker
  • A WordPress brute force attack script
  • Tools to scan for config files or sensitive information
  • Tools to download the entire site or parts thereof
  • The ability to scan for other attackers shells
  • Tools targeting specific CMS’s that let you change their configuration to host your own malicious code

The post includes a video of the attack platform in action.

Second, from Ars Technica, is a report of WordPress sites being hacked and made to download ransomware to visitors’ computers.

It’s not currently clear how the sites are being compromised, but it may be via an unknown zero-day security exploit. From the article:

According to a Monday blog post published by website security firm Sucuri, the compromised WordPress sites he observed have been hacked to include encrypted code at the end of all legitimate JavaScript files. The encrypted content is different from site to site…

It’s not yet clear how the WordPress sites are getting infected in the first place. It’s possible that administrators are failing to lock down the login credentials that allow the site content to be changed. It’s also feasible that attackers are exploiting an unknown vulnerability in the CMS, one of the plugins it uses, or the operating system they run on. Once a system is infected, however, the website malware installs a variety of backdoors on the webserver, a feature that’s causing many hacked sites to be repeatedly reinfected.

What can you do to protect your WordPress site? If you’re running WordPress, I strongly, strongly urge you to do the following:

  • Use strong admin passwords! I can not emphasize this enough. Use strong admin passwords! Criminals use automated tools to scan thousands of WordPress sites an hour looking for weak passwords. A normal WordPress install will be scanned dozens to hundreds of times a day. Use strong admin passwords!
  • Update all your sites RELIGIOUSLY. When a WordPres security patch is released, criminals will go to work examining the patch to see what it fixes, then develop automated tools to automatically hack unpatched sites. You may have only 24-48 hours between when a security patch comes out and when people start using tools that will automatically compromise sites that haven’t installed the patch. Turn on automatic updates. Keep on top of your site.
  • Install a tool like WordFence. This free plugin will protect your site by locking out people who use known attack tools or brute-force password guessing attempts. It will notify you by email of hack attempts and updates that need to be installed.
  • Install a tool like WPS Hide Login to move your login page to a hidden location, like /mysecretlogin instead of /wp-login.php. This will go miles toward securing your site.

I highly recommend you install the free Infinite WP tool as well. It’s a plugin plus a Web app that will notify you of updates and allow you to update one or many WordPress sites with just one button click. This is a great way to keep on top of security patches.

Also, absolutely do not assume you’re safe because you’re an obscure little blog that nobody cares about. The criminals will still find you. They use totally automated tools to scan for vulnerable WordPress sites looking for installations to exploit. It doesn’t matter if only you and your mom know about your site–criminals will find it and will exploit it.

Stay safe!

Update #5 on the sex toy you can feel

A lot of folks have been emailing me asking for an update to the project I am working on to build a strapon dildo covered with sensors that the wearer can actually feel.

I’ve been spending so much time working on this project that it’s been hard to keep up with blogging about it. I’m currently halfway done with a stage 2 prototype that’s way more advanced than the first prototype, and I’m excited about getting it done. We’ve also been talking to several amazing mentors and investors who are helping to make this thing a reality.

I also had overwhelming responses to our surveys–apologies to everyone who wanted to do a Skype or in-person interview I wasn’t able to schedule. The schedule filled up within minutes of putting out the call for interviews. I did a survey that got more than 1,700 responses and dozens of in-person and Skype interviews, which told us that there are lots of people who want this device.

I am really excited. I’ve had really positive conversations with a couple of investors in the last couple of weeks and hope to have all kinds of good news for everyone soon.

Want to keep up with developments? Here’s a handy list of blog posts about it:
First post
Update 1
Update 2
Update 3
Update 4

#WLAMF no. 36: Antique Calculators

I first went off to college in 1984. (I say “first” because I’ve had a somewhat checkered college career, with many false starts.) On the occasion of my going off to school, to learn (or so I thought) computer engineering, I got myself a programmable calculator: a Radio Shack EC-4004.

I’ve moved rather a lot since then, but somehow, and without any deliberate intention on my part, that calculator seems to have stuck with me…kind of like a cursed ring in an old Dungeons & Dragons game, with less eternal suffering and more calculating definite integrals. (Yes, it could do that.)

I found the calculator a few days back, while I was digging through a drawer looking for a roll of tape. It’s been through a lot; it’s covered with dust, and I have a hazy memory of spilling a shot of apple schnapps on it at some point in the past.

I flicked the power switch, not expecting a lot, and…it worked! The batteries, which have never been replaced and are now old enough to vote and drink alcohol, still worked a treat.

As powerful as modern smartphones and similar devices are, there’s no chance they’ll still work after a similar amount of time. Flash memory is cheap but transient, and loses information over time. Modern lithium ion batteries degrade over time. Leave an iPhone in a drawer for twenty years and it will be a paperweight on the other side.

This old calculator has a paltry amount of processing power compared even to a modern watch, but you gotta admire the way it just keeps going.


I’m writing one blog post for every contribution to our crowdfunding we receive between now and the end of the campaign. Help support indie publishing! We’re publishing five new books on polyamory in 2015.

#WLAMF no. 9: Fusion

A lot of the world’s social, economic, and resource problems are, when you come down to it, power problems. I don’t mean political power; I mean energy. Electricity.

Take fresh water, for instance. Three-quarters of the planet’s surface is covered by the stuff, yet much of the world doesn’t have reliable access to safe, clean water. 780 million people don’t have regular access to clean water. Nearly four million people die a year from water-bourne illness.

If we had unlimited quantities of cheap, clean energy, water would stop being a problem overnight. It’s easy to desalinate seawater…easy, but not cheap. The process requires enormous inputs of energy, and energy is expensive.

The holy grail of energy is, and has always been, fusion power. Fusion power offers vast quantities of energy from seawater…if we can make it work. And we’ve been chasing it for a while, though never with any serious determination; the world’s annual budget for fusion research is about 1/18th the annual revenue of the National Football League. (In the US, the annual budget for fusion research is less than what the Government Accountability Office spends on paperwork.) Fusion power promises one-stop shopping for reversing global carbon emissions, improving access to fresh water all over the world, raising the standard of living for developing nations, moving toward non-polluting transportation…

…if we can make it work.

It’s been a long road. A lot of engineers thought we’d have the problem licked by the mid-1960s. Here we are in 2014, and it’s only been in the last two years that teams at MIT and Lawrence Livermore have actually made fusion reactors that produce net positive energy…for short periods of time. It’s a very, very difficult nut to crack.

Enter Lockheed Martin.

Lockheed Martin recently announced that their Skunkworks team has been quietly, and secretly, working on fusion power for a while. And they claim to be within 5 years of an operating prototype of a compact fusion reactor.

Now, I am of two minds about this.

Pros:

– It’s the fucking Lockheed Martin fucking Skunkworks. These are not a bunch of cranks, kooks, or pie-in-the-sky dreamers. These guys built the SR-71 in the early 1960s, and the F-117 Stealth fighter back when the Radio Shack TRS-80 was the state of the art for personal computers.
– Lockheed doesn’t seem the kind of company to stake their reputation on a claim unless they’re really, really sure.
– They’re exploring deuterium-tritium fusion, which is a lot easier than ordinary hydrogen-hydrogen fusion of the sort that happens in the sun.
– Did I mention it’s the fucking Lockheed Martin fucking Skunkworks? They have money, engineering expertise, and problem-solving experience by the metric ton. They are accustomed to solving hard engineering problems 20 years before anyone else in the world even knows they can be solved.

Cons:

– Fusion is hard. The pursuit of fusion has left a lot of broken dreams in its wake.
– The design they propose encloses a set of superconducting magnets inside the fusion chamber. That’s clever, and solves a lot of problems with magnetic containment, but superconducting magnets are fragile things and the inside of a fusion chamber is as close as we can get to hell on earth.
– Fusion creates fast neutrons. Those fast neutrons tend to run into stuff and knock it all out of whack. Solving the problem of the reactor vessel degrading under intense neutron flux is non-trivial; in fact, that’s one of the key objectives of the multibillion-dollar International Thermonuclear Experimental Reactor being built by a consortium of countries in France.

Fusion power, if we can make it work, would likely (and without hyperbole) be one of the most significant achievements of the human race. It could and very likely would have farther-reaching impacts than the development of agriculture or the invention of iron, and would improve the standard of living for billions of people to a greater extent than any other single invention.

For that reason alone, I think it’s worth pursuing. I’d like to see it better funded…say, maybe even on the same scale as the NFL. I’m not sure of Lockheed can deliver what they’re promising, but I am very, very happy they’re in the race.


I’m writing one blog post for every contribution to our crowdfunding we receive between now and the end of the campaign. Help support indie publishing! We’re publishing five new books on polyamory in 2015: https://www.indiegogo.com/projects/thorntree-press-three-new-polyamory-books-in-2015/x/1603977

Some thoughts on machine learning: context-based approaches

A nontrivial problem with machine learning is organization of new information and recollection of appropriate information in a given circumstance. Simple storing of information (cats are furry, balls bounce, water is wet) is relatively straightforward, and one common approach to doing this is simply to define the individual pieces of knowledge as objects which contain things (water, cats, balls) and descriptors (water is wet, water flows, water is necessary for life; cats are furry, cats meow, cats are egocentric little psychopaths).

This presents a problem with information storage and retrieval. Some information systems that have a specific function, such as expert systems that diagnose illness or identify animals, solve this problem by representing the information hierarchically as a tree, with the individual units of information at the tree’s branches and a series of questions representing paths through the tree. For instance, if an expert system identifies an animal, it might start with the question “is this animal a mammal?” A “yes” starts down one side of the tree, and a “no” starts down the other. At each node in the tree, another question identifies which branch to take—”Is the animal four-legged?” “Does the animal eat meat?” “Does the animal have hooves?” Each path through the tree is a series of questions that leads ultimately to a single leaf.

This is one of the earliest approaches to expert systems, and it’s quite successful for representing hierarchical knowledge and for performing certain tasks like identifying animals. Some of these expert systems are superior to humans at the same tasks. But the domain of cognitive tasks that can be represented by this variety of expert system is limited. Organic brains do not really seem to organize knowledge this way.

Instead, we can think of the organization of information in an organic brain as a series of individual facts that are context dependent. In this view, a “context” represents a particular domain of knowledge—how to build a model, say, or change a diaper. There may be thousands, tens of thousands, or millions of contexts a person can move within, and a particular piece of information might belong to many contexts.

What is a context?

A context might be thought of as a set of pieces of information organized into a domain in which those pieces of information are relevant to each other. Contexts may be procedural (the set of pieces of information organized into necessary steps for baking a loaf of bread), taxonomic (a set of related pieces of information arranged into a hierarchy, such as knowledge of the various birds of North America), hierarchical (the set of information necessary for diagnosing an illness), or simply related to one another experientially (the set of information we associate with “visiting grandmother at the beach).

Contexts overlap and have fuzzy boundaries. In organic brains, even hierarchical or procedural contexts will have extensive overlap with experiential contexts—the context of “how to bake bread” will overlap with the smell of baking bread, our memories of the time we learned to bake bread, and so on. It’s probably very, very rare in an organic brain that any particular piece of information belongs to only one context.

In a machine, we might represent this by creating a structure of contexts CX (1,2,3,4,5,…n) where each piece of information is tagged with the contexts it belongs to. For instance, “water” might appear in many contexts: a context called “boating,” a context called “drinking,” a context called “wet,” a context called “transparent,” a context called “things that can kill me,” a context called “going to the beach,” and a context called “diving.” In each of these contexts, “water” may be assigned different attributes, whose relevance is assigned different weights based on the context. “Water might cause me to drown” has a low relevance in the context of “drinking” or “making bread,” and a high relevance in the context of “swimming.”

In a contextually based information storage system, new knowledge is gained by taking new information and assigning it correctly to relevant contexts, or creating new contexts. Contexts themselves may be arranged as expert systems or not, depending on the nature of the context. A human doctor diagnosing illness might have, for instance, a diagnostic context that behaves similarly in some ways to the way a diagnostic expert system; a doctor might ask a patient questions about his symptoms, and arrive at her conclusion by following the answers to a single possible diagnosis. This process might be informed by past contexts, though; if she has just seen a dozen patients with norovirus, her knowledge of those past diagnoses, her understanding of how contagious norovirus is, and her observation of the similarity of this new patient’s symptoms to those previous patients’ symptoms might allow her to bypass a large part of the decision tree. Indeed, it is possible that a great deal of what we call “intuition” is actually the ability to make observations and use heuristics that allow us to bypass parts of an expert system tree and arrive at a leaf very quickly.

But not all types of cognitive tasks can be represented as traditional expert systems. Tasks that require things like creativity, for example, might not be well represented by highly static decision trees.

When we navigate the world around us, we’re called on to perform large numbers of cognitive tasks seamlessly and to be able to switch between them effortlessly. A large part of this process might be thought of as context switching. A context represents a domain of knowledge and information—how to drive a car or prepare a meal—and organic brains show a remarkable flexibility in changing contexts. Even in the course of a conversation over a dinner table, we might change contexts dozens of times.

A flexible machine learning system needs to be able to switch contexts easily as well, and deal with context changes resiliently. Consider a dinner conversation that moves from art history to the destruction of Pompeii to a vacation that involved climbing mountains in Hawaii to a grandparent who lived on the beach. Each of these represents a different context, but the changes between contexts aren’t arbitrary. If we follow the normal course of conversations, there are usually trains of thought that lead from one subject to the next; and these trains of thought might be represented as information stored in multiple contexts. Art history and Pompeii are two contexts that share specific pieces of information (famous paintings) in common. Pompeii and Hawaii are contexts that share volcanoes in common. Understanding the organization of individual pieces of information into different contexts is vital to understanding the shifts in an ordinary human conversation; where we lack information—for example, if we don’t know that Pompeii was destroyed by a volcano—the conversation appears arbitrary and unconnected.

There is a danger in a system being too prone to context shifts; it meanders endlessly, unable to stay on a particular cognitive task. A system that changes contexts only with difficulty, on the other hand, appears rigid, even stubborn. We might represent focus, then, in terms of how strongly (or not) we cling to whatever context we’re in. Dustin Hoffman’s character in Rain Man possesses a cognitive system that clung very tightly to the context he was in!

Other properties of organic brains and human knowledge might also be represented in terms of information organized into contexts. Creativity is the ability to find connections between pieces of information that normally exist in different contexts, and to find commonalities of contextual overlap between them. Perception is the ability to assign new information to relevant contexts easily.

Representing contexts in a machine learning system is a nontrivial challenge. It is difficult, to begin with, to determine how many contexts might exist. As a machine entity gains new information and learns to perform new cognitive tasks, the number of contexts in which it can operate might increase indefinitely, and the system must be able to assign old information to new contexts as it encounters them. If we think of each new task we might want the machine learning system to be able to perform as a context, we need to devise mechanisms by which old information can be assigned to these new contexts.

Organic brains, of course, don’t represent information the way computers do. Organic brains represent information as neural traces—specific activation pathways among collections of neurons.

These pathways become biased toward activation when we are in situations similar to those where they were first formed, or similar to situations in which they have been previously activated. For example, when we talk about Pompeii, if we’re aware that it was destroyed by a volcano, other pathways pertaining to our experiences with or understanding of volcanoes become biased toward activation—and so, for example, our vacation climbing the volcanoes in Hawaii come to mind. When others share these same pieces of information, their pathways similarly become biased toward activation, and so they can follow the transition from talking about Pompeii to talking about Hawaii.

This method of encoding and recalling information makes organic brains very good at tasks like pattern recognition and associating new information with old information. In the process of recalling memories or performing tasks, we also rewrite those memories, so the process of assigning old information to new contexts is transparent and seamless. (A downside of this approach is information reliability; the more often we access a particular memory, the more often we rewrite it, so paradoxically, the memories we recall most often tend to be the least reliable.)

Machine learning systems need a system for tagging individual units of information with contexts. This becomes complex from an implementation perspective when we recall that simply storing a bit of information with descriptors (such as water is wet, water is necessary for life, and so on) is not sufficient; each of those descriptors has a value that changes depending on context. Representing contexts as a simple array CX (1,2,3,4,…n) and assigning individual facts to contexts (water belongs to contexts 2, 17, 43, 156, 287, and 344) is not sufficient. The properties associated with water will have different weights—different relevancies—depending on the context.

Machine learning systems also need a mechanism for recognizing contexts (it would not do for a general purpose machine learning system to respond to a fire alarm by beginning to bake bread) and for following changes in context without becoming confused. Additionally, contexts themselves are hierarchical; if a person is driving a car, that cognitive task will tend to override other cognitive tasks, like preparing notes for a lecture. Attempting to switch contexts in the middle of driving can be problematic. Some contexts, therefore, are more “sticky” than others, more resistant to switching out of.

A context-based machine learning system, then, must be able to recognize context and prioritize contexts. Context recognition is itself a nontrivial problem, based on recognition of input the system is provided with, assignment of that input to contexts, and seeking the most relevant context (which may in most situations be the context with greatest overlap with all the relevant input). Assigning some cognitive tasks, such as diagnosing an illness, to a context is easy; assigning other tasks, such as natural language recognition, processing, and generation in a conversation, to a context is more difficult to do. (We can view engaging in natural conversation as one context, with the topics of the conversation belonging to sub-contexts. This is a different approach than that taken by many machine conversational approaches, such as Markov chains, which can be viewed as memoryless state machines. Each state, which may correspond for example to a word being generated in a sentence, can be represented by S(n), and the transition from S(n) to S(n+1) is completely independent of S(n-1); previous parts of the conversation are not relevant to future parts. This creates limitations, as human conversations do not progress this way; previous parts of a conversation may influence future parts.)

Context seems to be an important part of flexibility in cognitive tasks, and thinking of information in terms not just of object/descriptor or decision trees but also in terms of context may be an important part of the next generation of machine learning systems.

Sex tech: Update on the dildo you can feel

A few months back, I wrote a blog post about a brain hack that might create a dildo the wearer can actually feel. The idea came to me in the shower. I’d been thinking about the brain’s plasticity, and about how it might be possible to trick the brain into internalizing a somatosensory perception that a strap-on dildo is a real part of the body, by using sensors along the dildo connected to tiny electrical stimulation pads worn inside the vagina.

It’s an interesting idea, I think. So I blogged about it. I didn’t expect the response I got.

I’ve received a bunch of emails about it, and had a bunch of people tell me “OMG this is the most amazing thing ever! Make it happen!”

So I have, between work on getting the book More Than Two out the door and preparing for the book tour, been chugging away at this idea. Here’s an update:

1. I’ve filed for a patent on the idea. I’ve received confirmation that the application has been accepted and the process is started.

2. I’ve talked to an electronics prototyping firm about developing a prototype. Based on feedback from the prototyping firm, I’ve modified the initial design extensively. The first version I’d thought about was based on the same principle as the Feeldoe; the redesign uses a separate dildo and harness, with an external computer to receive signals from the sensors in the dildo and transmit them to the vaginal insert. The new design looks, and works, something like this. (Apologies for the horrible animated GIF; art isn’t really my specialty.)

3. The prototyping firm has outlined a multi-step process to develop a workable, manufacturable device. The process would go something like:

Phase 1: Research and proof of concept. This would include researching designs for the sensors on the dildo and the electrodes on the vaginal insert. It would also include a crude proof-of-concept device that would essentially be nothing more than the vaginal insert connected to a computer programmed to simulate the rest of the device.

The intent at this stage is to see if the idea is even workable. What kind of electrodes could be used? Would the produce the right kind of stimulation? How densely arranged could they be? How small could they be? Would the brain actually be able to interpret sensations produced by the electrodes in a way that would trick the wearer into thinking the dildo was a part of the body? If so, how long would that somatosensory rewiring take?

Phase 2: Assuming the initial research showed the idea to be viable, the next step would be to figure out a sensor design, fabricate a microcontroller to connect the sensors to the electrodes, and experiment with sensor design and fabrication. Would a single sensor provide adequate range of tactile feedback, or would it be necessary to multiplex several sensors (some designed to respond to light touch, others to a heavier touch) together in order to provide a good dynamic range? What mechanical properties would the sensors need to have? How would they be built? (We talked about several potential designs, including piezoelectric, resistive polymer, and fluid-filled devices.) How would the sensors be placed along the dildo?

Phase 3: Once a working prototype is developed, the next step is detail design and engineering. This is essentially the process of taking a working prototype and producing a manufacturable product from it. This includes everything from engineering drawings for fabrication to choosing materials to developing the final version of the software.

So. That’s where the project is right now.

The up side? I think this thing could actually work. The down side? It’s going to be expensive.

I have already started investigating ways to make it happen. If we incorporate in Canada, we may be eligible for Canadian financial incentives designed to spur tech research and development.

The fabricating company seems to think the first phase would most likely cost somewhere around $5,000-10,000. Depending on what’s learned during that phase, the development of a fully functional prototype might run anywhere from $50,000 to $100,000, a lot of which hinges on design of the sensors, which will likely be the most challenging bit of engineering. They didn’t even want to speculate about the cost of going from working prototype to manufacturable product; too many unknowns.

I’m discussing the possibility of doing crowdfunding to get from phase 2 to 3, and possibly from phase 1 to 2. It’s not likely that crowdfunding is appropriate for the first phase, because I won’t have anything tangible to offer backers. Indeed, it’s possible that I might spend the initial money and discover the idea isn’t workable.

Ideally, I’d like to find people who think this idea is worth investigating who can afford to invest in the first phase. If you know anybody who might be interested in this project, let me know!

Also, one of the people at the prototyping company suggested the name “Hapdick.” I’m still not sure how I feel about that, but I do have to admit it’s clever.

Want to keep up with developments? Here’s a handy list of blog posts about it:
First post
Update 1
Update 2
Update 3
Update 4
Update 5
Update 6
Update 7
Update 8
Update 9

1984: How George Orwell Got it Wrong

When I was in high school, one of the many books on our required reading list in my AP English class was George Orwell’s 1984. As a naive, inexperienced teenager, I was deeply affected by it, in much the same way many other naive, inexperienced teens are deeply affected by Atlas Shrugged. I wrote a glowing book report, which, if memory serves, got me an A+.

1984 was a crude attempt at dystopian fiction, partly because it was more a hysterical anti-Communist screed than a serious effort at literature. Indeed, had it not been written at exactly the point in history it was written, near the dawn of the Cold War and just prior to the rise of McCarthyist anti-communist hysteria, it probably would not have become nearly the cultural touchstone it is now.

From the vantage point of 2014, parts of it seem prescient, particularly the overwhelming government surveillance of every aspect of the citizen’s lives. 1984 describes a society in which everyone is watched, all the time; there’s a minor plot hole (who’s watching all these video feeds?), but it escaped my notice back then.

But something happened on the way to dystopia–something Orwell didn’t predict. We tend to see surveillance as a tool of oppressive government; in a sense, we have all been trained to see it that way. But it is just as powerful a tool in the hands of the citizens, when they use it to watch the government.


As I write this, the town of Ferguson, Missouri has been wracked for over a week now because of the killing of an unarmed black teenager at the hands of an aggressive and overzealous police officer. When the people of Ferguson protested, the police escalated, and escalated, and escalated, responding with tear gas, arrests, and curfews.

Being a middle-aged white dude gives me certain advantages. I don’t smoke pot, but if I did and a police officer found me with a bag of weed in my pocket, the odds I’d ever go to prison are very, very small. Indeed, the odds I’d even be arrested are small. If I were to jaywalk in front of a police officer, or be seen by a police officer walking at night along a suburban sidewalk, the odds of a violent confrontation are vanishingly tiny. So it’s impossible for me, or real;y for most white dudes, to appreciate or even understand what it’s like to be black in the United States.

This is nothing new. The hand of government weighs most heavily on those who are least enfranchised, and it has always been so. All social structures, official and unofficial, slant toward the benefit of those on top, and in the United States, that means the male and pale.

And there’s long been a strong connection between casual, systemic racism and the kind of anti-Commie agitprop that made Orwell famous.

It is ironic, though not unexpected, that the Invisible Empire of the Knights of the Ku Klux Klan is raising a “reward” for the police officer who “did his job against the negro criminal”.

So far, so normal. This is as it has been since before the founding of this country. But now, something is different…and not in the way Orwell predicted. Surveillance changes things.


What Orwell didn’t see, and couldn’t have seen, is a time in which nearly every citizen carries a tiny movie camera everywhere. The rise of cell phones has made citizen surveillance nearly universal, with results that empower citizens against abuses of government, rather than the other way around.

Today, it’s becoming difficult for police to stop, question, arrest, beat, or shoot someone without cell phone footage ending up on YouTube within hours. And that is, I think, as it should be. Over and over again, police have attempted to prevent peopel from recording them in public places…and over and over again, the courts have ruled that citizens have the right to record the police.

It’s telling that in Ferguson, the protestors, who’ve been labeled “looters” and “thugs” by police, have been the ones who want video and journalism there…and it’s been the police who are trying to keep video recording away. That neatly sums up everything you need to know about the politics of Ferguson, seems to me.

Cell phone technology puts the shoe on the other foot. And, unsurprisingly, when the institutions of authority–the ones who say “if you have nothing to hide, you have nothing to fear from surveillance”–find themselves on the receiving end rather than the recording end of surveillance, they become very uncomfortable. In the past, abuses of power were almost impossible to prosecute; they happened in dark places, away from the disinfecting eye of public scrutiny. But now, that’s changing. Now, it’s harder and harder to find those dark places where abuse thrives.

In fact, the ACLU has released a smartphone app called Police Tape, which you can start running as soon as you find yourself confronted by police. It silently (and invisibly) records everything that happens, and uploads the file to a remote server.

If those in power truly had nothing to hide, they would welcome surveillance. New measures are being proposed in many jurisdictions that would require police officers to wear cameras wherever they go. The video from these cameras could corroborate officers’ accounts of their actions whenever misconduct was alleged, if–and this is the critical part–the officers tell the truth. When I hear people object to such cameras, then, the only conclusion I can draw is they don’t want a record of their activities, and I wonder why.

William Gibson, in the dystopian book Neuromancer (published, as fate would have it, in 1984) proposed that the greatest threats to personal liberty come, not from a government, but from corporations that assume de facto control over government. His vision seems more like 1984 than 1984. He was less jaundiced than Orwell, though. In the short story Burning Chrome, Gibson wrote, “The street finds its own uses for things.” The explosion of citizen surveillance proves how remarkably apt that sentiment is.

The famous first TV commercial for the Apple Macintosh includes the line “why 1984 won’t be like 1984.” The success of the iPhone and other camera-equipped smartphones, shows how technology can turn the tables on authority.

The police commissioners and state governors and others in the halls of political power haven’t quite figured out the implications yet. Technology moves fast, and the machinery of authority moves slowly. But the times, they are a-changin’. Orwell got it exactly wrong; it is the government, not the citizens, who have the most to fear from a surveillance society.

And that is a good thing.

Cloudflare: The New Face of Bulletproof Spam Hosting

…or, why do I get all this spam, and who’s serving it?

Spammers have long had to face a problem. Legitimate Web hosting companies don’t host spam sites. Almost all Web hosts have policies against spam, so spammers have to figure out how to get their sites hosted. After all, if you can’t go to the spammer’s website to buy something, the spammer can’t make money, right?

In the past, spammers have used overseas Web hosting companies, in countries like China or Romania, that are willing to turn a blind eye to spam in exchange for money. A lot of spammers still do this, but it’s becoming less common, as even these countries have become increasingly reluctant to host spam sites.

For a while, many spammers were turning to hacked websites. Someone would set up a WordPress blog or a Joomla site but wouldn’t keep on top of security patches. The spammers would use automated tools capable of scanning hundreds of thousands of sites looking for vulnerabilities and hacking them automatically, then they’d place the spam pages on the hacked site. And a lot of spammers still do this.

But increasingly, spammers are turning to the new big thing in bulletproof spam serving: content delivery networks like Cloudflare.


What is a content delivery network?

Basically, a content delivery network is a bunch of servers that sit between a traditional Web server and you, the Web user.

A ‘normal’ Web server arrangement looks something like this:

When you browse the Web, you connect directly to a Web server over the Internet. The Web server takes the information stored on it and sends it to your computer.

With a content delivery network, it looks more like this:

The CDN, like Cloudflare, has a large number of servers, often spread all over the country (or the globe). These servers make a copy of the information on the Web server. When you visit a website served by a CDN, you do not connect to the Web server. You connect to one of the content delivery network servers, which sends you the copy of the information it made from the Web server.

There are several advantages to doing this:

1. The Web server can handle more traffic. With a conventional Web server, if too many people visit the Web site at the same time, the Web server can’t handle the traffic, and it goes down.

2. The site is protected from hacking and denial-of-service attacks. If someone tries to hack the site or knock it offline, at most they can affect one of the CDN servers. The others keep going.

3. It’s faster. If you are in Los Angeles and the Web server is in New York, the information has to travel many “hops” through the Internet to reach you. If you’re in Los Angeles and the content delivery network has a server in Los Angeles, you’ll connect to it. There are fewer hops for the information to pass through, so it’s delivered more quickly.


Cloudflare and spam

Spammers love Cloudflare for two reasons. First, when a Web server is behind Cloudflare’s network, it is in many ways hidden from view. You can’t tell who’s hosting it just by looking at its IP address, the way you can with a conventional Web server, because the IP address you see is for Cloudflare, not the host.

Second, Cloudflare is fine with spam. They’re happy to provide content delivery services for spam, malware, “phish” sites like phony bank or PayPal sites–basically, whatever you want.

Cloudflare’s Web page says, a little defensively, “CloudFlare is a pass-through network provider that automatically caches content for a limited period in order to improve network performance. CloudFlare is not a hosting provider and does not provide hosting services for any website. We do not have the capability to remove content from the web.” And, technically speaking, that’s true.

Cloudflare doesn’t own the Web server. They don’t control what’s on it and they can’t take it offline. So, from a literal, technical perspective, they’re right when they say they can’t remove content from the web.

They can, however, refuse to provide services for spammers. They can do that, but they don’t.


History

CloudFlare was founded by Matthew Prince, Lee Holloway, and Michelle Zatlyn, three people who had previously worked on Project Honey Pot, which was–ironically–an anti-spam, anti-malware project.

Project Honey Pot allows website owners to track spam and hack attacks against their websites and block malicious traffic. In an interview with Forbes magazine, Michelle Zatlyn said:

“I didn’t know a lot about website security, but Matthew told me about Project Honey Pot and said that 80,000 websites had signed up around the world. And I thought ‘That’s a lot of people.’ They had no budget. You sign up and you get nothing. You just track the bad guys. You don’t get protection from them. And I just didn’t understand why so many people had signed up.”

It was then that Prince suggested creating a service to protect websites and stop spammers. “That’s something I could be proud of,’” Zatlyn says. “And so that’s how it started.”

So Cloudflare, which was founded with the goal of stopping spammers by three anti-spam activists, is now a one-stop, bulletproof supplier for spam and malware services.


The problem

Cloudflare, either intentionally or deliberately, has a broken internal process for dealing with spam and abuse complaints. Spamcop–a large anti-spam website that processes spam emails, tracks the responsible mail and Web hosts and notifies them of the spam–will no longer communicate with Cloudflare, because Cloudflare does not pay attention to email reports of abuse even though it has a dedicated abuse email address (that’s often unworkakble, as Cloudflare has in the past enabled spam filtering on that address, meaning spam complaints get deleted as spam).

Large numbers of organized spam gangs sign up for Cloudflare services. I track all the spam that comes into my mailbox, and I see so much spam that’s served by Cloudflare I keep a special mailbox for it.

Right now, about 15% of all the spam I receive is protected by Cloudflare. Repeated complaints to their abuse team, either to their abuse email addres or on their abuse Web form, generally have no effect. As I’ve documented here, Cloudflare will continue to provide services for spam, malware, and phish sites even long after the Web host that’s responsible for them has taken them down; they kept providing services for the malware domain rolledwil.biz, being used as part of a large-scale malware attack against Android devices, for months after being notified.

One of the spam emails in my Cloudflare inbox dates back to November of 2013. The Spamvertised domain, is.ss47.shsend.com, is still active, nearly a year after Cloudflare was notified of the spam. A PayPal phish I reported to CloudFlare in March of 2014 was finally removed from their content delivery network three months later…after some snarky Twitter messages from Cloudflare’s security team.

(They never did put up the interstitial warning, and continued to serve the PayPal phish page for another month or more.)

Cloudflare also continues to provide services for sites like masszip.com, the Web site that advertises pirated eBooks but actually serves up malware.

In fact, I’ve been corresponding with a US copyright attorney about the masszip.com piracy, and he tells me that Cloudflare claims immunity from US copyright law. They claim that people using the Cloudflare CDN aren’t really their concern; they’re not hosting the illegal content, they’re just making a copy of it and then distributing it, you see. Or, err, something.

I am not sure what happened within Cloudflare to make them so reluctant to terminate their users even in cases of egregious abuse, such as penis-pill spam, piracy, and malware distribution. From everything I can find, it was started by people genuinely dedicated to protecting the Internet from spam and malware, but somehow, somewhere along the way, they dropped the ball.

I wonder if Michelle Zatlyn is still proud.

Some thoughts on government funding for research

Every time you buy a hard drive, some of your money goes to the German government.

That’s because in the late 1990s, a physicist named Peter Grünberg at the Forschungszentrum Jülich (Jülich Research Center) made a rather odd discovery.

The Jülich Research Center is a government-funded German research facility that explores nuclear physics, geoscience, and other fields. There’s a particle accelerator there, and a neutron scattering reactor, and not one or two or even three but a whole bunch of supercomputers, and a magnetic confinement fusion tokamak, and a whole bunch of other really neat and really expensive toys. All of the Center’s research money comes from the government–half from the German federal government and half from the Federal State of North Rhine-Westphalia.

Anyway, like I was saying, in the late 1990s, Peter Grünberg made a rather odd discovery. He was exploring quantum physics, and found that in a material made of several layers of magnetic and non-magnetic materials, if the layers are thin enough (and by “thin enough” I mean “only a few atoms thick”), the material’s resistance changes dramatically when it’s exposed to very, very weak magnetic fields.

There’s a lot of deep quantum voodoo about why this is. Wikipedia has this to say on the subject:

If scattering of charge carriers at the interface between the ferromagnetic and non-magnetic metal is small, and the direction of the electron spins persists long enough, it is convenient to consider a model in which the total resistance of the sample is a combination of the resistances of the magnetic and non-magnetic layers.

In this model, there are two conduction channels for electrons with various spin directions relative to the magnetization of the layers. Therefore, the equivalent circuit of the GMR structure consists of two parallel connections corresponding to each of the channels. In this case, the GMR can be expressed as

Here the subscript of R denote collinear and oppositely oriented magnetization in layers, χ = b/a is the thickness ratio of the magnetic and non-magnetic layers, and ρN is the resistivity of non-magnetic metal. This expression is applicable for both CIP and CPP structures.

Make of that what you will.


Conservatives and Libertarians have a lot of things in common. In fact, for all intents and purposes, libertarians in the United States are basically conservatives who are open about liking sex and drugs. (Conservatives and libertarians both like sex and drugs; conservatives just don’t cop to it.)

One of the many areas they agree on is that the governmet should not be funding science, particularly “pure” science with no obvious technological or commercial application.

Another thing they have in common is they don’t understand what science is. In the field of pure research, you can never tell what will have technological or commercial application.

Back to Peter Grünberg. He discovered that quantum mechanics makes magnets act really weird, and in 2007 he shared a Nobel Prize with French physicist Albert Fert, a researcher at the French Centre national de la recherche scientifique (French National Centre for Scientific Research), France’s largest government-funded research facility.

And it turns out this research had very important commercial applications:

You know how in the 80s and 90s, hard drives were these heavy, clunky things with storage capacities smaller than Rand Paul’s chances at ever winning the Presidency? And then all of a sudden they were terabyte this, two terabyte that?

Some clever folks figured out how to use this weird quantum mechanics voodoo to make hard drive heads that could respond to much smaller magnetic fields, meaning more of them could be stuffed on a magnetic hard drive platter. And boom! You could carry around more storage in your laptop than used to fit in a football stadium.

It should be emphasized that Peter Grünberg and Albert Fert were not trying to invent better hard drives. They were government physicists, not Western Digital employees. They were exploring a very arcane subject–what happens to magnetic fields at a quantum level–with no idea what they would find, or whether it would be applicable to anything.


So let’s talk about your money.

When it became obvious that this weird quantum voodoo did have commercial possibility, the Germans patented it. IBM was the first US company to license the patent; today, nearly all hard drives license giant magnetoresistance patents. Which means every time you buy a hard drive, or a computer with a hard drive in it, some of your money flows back to Germany.

Conservatives and libertarians oppose government funding for science because, to quote the Cato Institute,

[G]overnment funding of university science is largely unproductive. When Edwin Mansfield surveyed 76 major American technology firms, he found that only around 3 percent of sales could not have been achieved “without substantial delay, in the absence of recent academic research.” Thus some 97 percent of commercially useful industrial technological development is, in practice, generated by in-house R&D. Academic science is of relatively small economic importance, and by funding it in public universities, governments are largely subsidizing predatory foreign companies.

Make of that what you will. I’ve read it six times and I’m still not sure I understand the argument.

The Europeans are less myopic. They understand two things the Americans don’t: pure research is the necessary foundation for a nation’s continued economic growth, and private enterprise is terrible at funding pure research.

Oh, there are a handful of big companies that do fund pure research, to be sure–but most private investment in research comes after the pure, no-idea-if-this-will-be-commercially-useful, let’s-see-how-nature-works variety.

It takes a lot of research and development to get from the “Aha! Quantum mechanics does this strange thing when this happens!” to a gadget you have in your home. That also takes money and development, and it’s the sort of research private enterprise excels at. In fact, the Cato Institute cites many examples of biotechnology and semiconductor research that are privately funded, but these are types of research that generally already have a clear practical value, and they take place after the pure research upon which they rest.

So while the Libertarians unite with the Tea Party to call for the government to cut funding for research–which is working, as government research grants have fallen for the last several years in a row–the Europeans are ploughing money into their physics labs and research facilities and the Superconducting Supercollider, which I suspect will eventually produce a stream of practical, patentable ideas…and every time you buy a hard drive, some of your money goes to Germany.

Modern societies thrive on technological innovation. Technological innovation depends on understanding the physical world–even when it seems at first like there aren’t any obvious practical uses for what you learn. They know that, we don’t. I think that’s going to catch up with us.