Archive for the ‘Anonymity & Privacy’ Category

Quickies: dns, tor 0.2.0.x, truecrypt 6, cold boot sw, fips stuff, j-pake

Wednesday, July 23rd, 2008

I am posting these snippets in a period of quiet between bedraggling, booming storms. Alas, dear reader, another set of quickies, as summertime rolls.

-

So, the DNS implementation flaw has (allegedly) been revealed. Grinding through weak state combined with additional RR trickery gets the job done. Good stuff. Adding more entropy to the state information is one way to fend off feasibility, as demonstrated by djbdns (yes, I mentioned djbdns at one point). Patches abound.

Update: An exploit.
-

The 0.2.0.x branch of Tor has gone from alpha to release, with 0.2.0.30 being the first version deemed fit for primetime.

I tagged Tor 0.2.0.30, the stable release of the 0.2.0.x series, on Tuesday.

You can read all about it here:
https://svn.torproject.org/svn/tor/tags/tor-0_2_0_30/ReleaseNotes

We haven’t made an official announcement yet, because we’re waiting for Torbutton 1.2.0 to go stable so we can build packages for the Windows and OS X folks.

From the release notes

Changes in version 0.2.0.30 - 2008-07-15
This new stable release switches to a more efficient directory distribution design, adds features to make connections to the Tor network harder to block, allows Tor to act as a DNS proxy, adds separate rate limiting for relayed traffic to make it easier for clients to become relays, fix a variety of potential anonymity problems, and includes the usual huge pile of other features and bug fixes.

Great work.

-

The TrueCrypt team has been quite busy it seems, with two major releases this year within a few months of each other. So, TrueCrypt 6 has been rolled out, and I took note of the following new features.

Parallelized encryption/decryption on multi-core processors (or multi-processor systems). Increase in encryption/decryption speed is directly proportional to the number of cores and/or processors.

Each volume created by this or later versions of TrueCrypt will contain an embedded backup header (located at the end of the volume). Note that it is impossible to mount a volume when its header is damaged (the header contains an encrypted master key). Therefore, embedded backup headers significantly reduce this risk. Also note that a backup header is not a copy of the original volume header because it is encrypted with a different header key derived using a different salt. For more information, see the subsection Tools > Restore Volume Header.

More great work.

-

Remember cold boot? Well, the software created as part of the research has now been released.

July 16, 2008 — This page contains source code for some of the software that we developed in the course of this research. These prototype applications are intended to illustrate the techniques described in the paper; we are unable to provide technical support.

-

You may have noticed this.

4Q08 The second draft of FIPS 140-3 will be published for public comment (subject to change).

-

A few new modes of operation for the AES have popped up at NIST in somewhat recent months. One of these, XTS, is now being proposed by NIST for approval for government use, meaning it could become an approved mode of operation for AES under FIPS 140.

The P1619 Task Group of the Security in Storage Working Group (SISWG) of the Institute of Electrical and Electronics Engineers, Inc. (IEEE) has submitted the XTS-AES algorithm (XTS, for short) to NIST as an encryption mode of operation of the Advanced Encryption Standard (AES) block cipher. Although XTS does not provide authentication in order to avoid expansion of the data, it is designed to provide some protection against malicious manipulation of the encrypted data. Subject to the 90-day period of public comment that is described below, NIST proposes to approve XTS for government use under the auspices of FIPS Pub. 140-2.

-

Draft SP 800-107, Recommendation for Applications Using Approved Hash Algorithms, has been published and is open to public comment until 09-Oct-2008.

This Recommendation provides security guidelines for achieving the required or desired security strengths of several cryptographic applications that employ the approved cryptographic hash functions specified in Federal Information Processing Standard (FIPS) 180-3 [FIPS 180-3], such as digital signature applications [FIPS 186-3], Keyedhash Message Authentication Codes (HMACs) [FIPS 198-1] and Hash-based Key Derivation Functions (HKDFs) [SP 800 56A] & [SP 800 56B].

Update: Related to the SP 800-107 news, a revision to FIPS 198 has been released, FIPS 198-1.

APPENDIX A: The Differences Between FIPS 198 and FIPS 198-1
The length of truncated HMAC outputs and their security implications in FIPS 198 is not mentioned in this Standard; instead, it is described in SP 800-107. The discussion about the limitations of MAC algorithms has been moved to SP 800-107. The examples and OIDs have been posted on the NIST web sites referenced in Section 6.

-

Yes please.

Abstract. Password-Authenticated Key Exchange (PAKE) studies how to establish secure communication between two remote parties solely based on their shared password, without requiring a Public Key Infrastructure (PKI). Despite extensive research in the past decade, this problem remains unsolved. Patent has been one of the biggest brakes in deploying PAKE solutions in practice. Besides, even for the patented schemes like EKE and SPEKE, their security is only heuristic; researchers have reported some subtle but worrying security issues.

In this paper, we propose to tackle this problem using an approach different from all past solutions. Our protocol, Password Authenticated Key Exchange by Juggling (J-PAKE), achieves mutual authentication in two steps: first, two parties send ephemeral public keys to each other; second, they encrypt the shared password by juggling the public keys in a verifiable way. The first use of such a juggling technique was seen in solving the Dining Cryptographers problem in 2006. Here, we apply it to solve the PAKE problem, and show that the protocol is zero-knowledge as it reveals nothing except one-bit information: whether the supplied passwords at two sides are the same. With clear advantages in security, our scheme has comparable efficiency to the EKE and SPEKE protocols.

Quickies: headaches, links, stories, crypto stuff, other

Wednesday, May 21st, 2008

So, I recently saw Juno for the first time, and was surprised and happy to hear Kimya Dawson permeate the soundtrack. And, I ate at wd~50, which was, well, an experience - highly recommended if you want to try something truly new.

Anyway, yes, still here. Unfortunately, this rare act of posting will be limited to a quickie - a few miscellaneous items accumulated over a couple of months.

-

A few headaches…

  1. Wow, just wow.

    The result of this is that for the last two years (from Debian’s “Etch” release until now), anyone doing pretty much any crypto on Debian (and hence Ubuntu) has been using easily guessable keys. This includes SSH keys, SSL keys and OpenVPN keys.

    Update immediately. And be sure to do things such as regenerate all persistent keys that used random data taken from the vulnerable Debian OpenSSL during their generation - some of this type of work is handled automatically when updating your packages (e.g., OpenSSH server keys), but only you know what you have done outside this automated window.

  2. I booted up a Windows XP box only to find that some resource of the Logitech QuickCam software had become corrupted, and this resulted in a nasty msi installer loop hitting the box as soon as a user logged in. I found this tool extremely useful in cleaning up the mess.

    [..]With the Windows Installer CleanUp Utility, you can remove a program’s Windows Installer configuration information. You may want to remove the Windows Installer configuration information for your program if you experience installation (Setup) problems. For example, you may have to remove a program’s Windows Installer configuration information if you have installation problems when you try to add (or remove) a component of your program that was not included when you first installed your program.

  3. Much to my very unpleasant surprise, I upgraded a server over here from Ubuntu Gutsy to Hardy and discovered networking for Xen DomU’s in Ubuntu Hardy 8.04 is somewhat broken. I ended up using the 2.6.24-17-xen kernel from the hardy-proposed repository, but you could also just stick with the Gutsy Xen kernel for DomU’s.
  4. When I upgraded to Gnome 2.22 on a particular FreeBSD 7.0 system, certain things, like the clock applet, did not work. This was due to the dbus daemon not being started. Then, the hal daemon and Gnome did not want to play nice together. This page provided the information necessary to get them playing nice - in this particular instance, not mounting procfs was the problem.

Side note, I was helping someone install Ubuntu recently, and it brought to my attention yet again how much I take what I consider to be basic skills for granted when using *nix. Small things, like running an executable from within your environment path versus by directly specifying the path (the most confusing to many example of this seems to be trying to run an executable in the current working directory), are just not common knowledge. Even the whole command line itself is often scary and bizarre to people. This makes it extremely difficult to provide useful guidance for people to blindly follow (posts on this blog should never be assumed to provide step-by-step instructions - they are just some basic notes at best, as extremely helpful comments like this make obvious). And, even Ubuntu is not as trivial to use as it would appear to many that work in the *nix world.

-

Three useful Ubuntu and FreeBSD pages…

USN.

These are the Ubuntu security notices that affect the current supported releases of Ubuntu.[...]

FreeBSD VuXML.

Security issues that affect the FreeBSD operating system or applications in the FreeBSD Ports Collection are documented using the Vulnerabilities and Exposures Markup Language (VuXML).[...]

FreeBSD Security Advisories.

This web page contains a list of released FreeBSD Security Advisories.[...]

-

Story-telling is one of the fundamental tools in a teacher’s toolkit. Having done quite a bit of consulting, I have learned how invaluable a good story is to driving home particular points and building relationships. There is something fundamental about how stories effect us, perhaps stemming from our innate ability to empathesize and our massive pattern recognition horse-power.

Anyway, this site has a useful set of stories. [via exi]

Here are some stories, analogies, research findings and other examples that provide wonderful illustrations for learning, and inspiration for self-development.

In fact, the first story mirrors a recent post.

An old lady had a hearing-aid fitted, discreetly, hidden underneath her hair.

A week later she returned to the doctor for her check-up.

“It’s wonderful - I can hear everything now,” she reported very happily to the doctor.

“And is your family pleased too?” asked the doctor.

“Oh I haven’t told them yet,” said the old lady, “And I’ve changed my will twice already..”

This reminds me of a story I often tell about someone I met a while back, a graduate student in psych. She was studying aspects of the initial meeting/courtship routines of people, and would go out to bars and such and interact with potential suitors. Here these suitors were, following their typical pickup routines and sometimes spilling more than their drinks, and here she was, analyzing their interaction and subsequently writing up notes to be turned into research papers, etc.

-

NIST draft SP 800-108 has been released.

SP 800-108

DRAFT Recommendation for Key Derivation Using Pseudorandom Functions

NIST announces the release of draft Special Publication 800-108, Recommendation for Key Derivation Using Pseudorandom Functions. This Recommendation specifies techniques for key derivation from a secret key using pseudorandom functions (PRF). Please submit comments to draft-SP800-108-comment@nist.gov with “Comments on SP800-108″ in the subject line. The comment period closes on June 28, 2008.

Yet more KDFs.

-

They just don’t quit, do they?

1

This is the first article analyzing the security of SHA-256 against fast collision search which considers the recent attacks by Wang et al. We show the limits of applying techniques known so far to SHA-256. Next we introduce a new type of perturbation vector which circumvents the identified limits. This new technique is then applied to the unmodified SHA-256. Exploiting the combination of Boolean functions and modular addition together with the newly developed technique allows us to derive collision-producing characteristics for step-reduced SHA-256, which was not possible before. Although our results do not threaten the security of SHA-256, we show that the low probability of a single local collision may give rise to a false sense of security.

2

We study the security of step-reduced but otherwise unmodified SHA-256. We show the first collision attacks on SHA-256 reduced to 23 and 24 steps with complexities $2^{18}$ and $2^{50}$, respectively. We give an example colliding message pair for 23-step SHA-256. The best previous, recently obtained result was a collision attack for up to 22 steps. Additionally, we show non-random behaviour of SHA-256 in the form of pseudo-near collisions for up to 31 steps, which is 6 more steps than the recently obtained non-random behaviour in the form of a semi-free start near-collision. Even though this represents a step forwards in terms of cryptanalytic techniques, the results do not threaten the security of applications using SHA-256.

3

[...]First we describe message modification techniques and use them to obtain an algorithm to generate message pairs which collide for the actual SHA-256 reduced to 18 steps. Our second contribution is to present differential paths for 19, 20, 21, 22 and 23 steps of SHA-256. We construct parity check equations in a novel way to find these characteristics. Further, the 19-step differential path presented here is constructed by using only 15 local collisions, as against the previously known 19-step near collision differential path which consists of interleaving of 23 local collisions. Our 19-step differential path can also be seen as a single local collision at the message word level. We use a linearized local collision in this work. These results do not cause any threat to the security of the SHA-256 hash function.

4

[...]We build on the work of Nikoli\’{c} and Biryukov and provide a generalized nonlinear local collision which accepts an arbitrary initial message difference. This local collision succeeds with probability 1. Using this local collision we present attacks against 18-step SHA-256 and 18-step SHA-512 with arbitrary initial difference. Both of these attacks succeed with probability 1. We then present special cases of our local collision and show two different differential paths for attacking 20-step SHA-256 and 20-step SHA-512. One of these paths is the same as presented by Nikoli\’{c} and Biryukov while the other one is a new differential path. Messages following both these differential paths can be found with probability 1. This improves on the previous result where the success probability of 20-step attack was 1/3. Finally, we present two differential paths for 21-step collisions for SHA-256 and SHA-512, one of which is a new path. The success probability of these paths for SHA-256 is roughly $2^{-15}$ and $2^{-17}$ which improves on the 21-step attack having probability $2^{-19}$ reported earlier. We show examples of message pairs following all the presented differential paths for up to 21-step collisions in SHA-256. We also show first real examples of colliding message pairs for up to 20-step reduced SHA-512.

Completely academic, but you know what they say - attacks only get better.

-

Interesting stuff.

NSA/CSS periodically releases declassified documents or indexes to these documents to the public. The documents listed on this page were located in response to numerous requests received by NSA on the various subjects stated and for which there appears to be a general public interest. The date after each entry reflects the most current release date of that material. When additional material for a given subject is updated then a new subject index date is posted. To select a subject index, click on the subject title.

In particular, the Cryptologic Spectrum Articles and Cryptologic Quarterly Articles have some fun reads.

-

An easy to cut ‘n paste into a blog post set of old web server log entries depicting an instance of Tor being used to proxy/anonymize automated probes of this web site.

anonymizer.blutmagie.de - - [13/Mar/2008:00:59:00 -0700] “GET /forum/phpbb/index.php HTTP/1.0″ 404 285 “http://forum.d-kriptik.com/phpbb/index.php” “Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1)”

tor.anonymous.proxy.quex.org - - [13/Mar/2008:00:59:01 -0700] “GET /forum/phpbb2/index.php HTTP/1.0″ 404 286 “http://forum.d-kriptik.com/phpbb2/index.php” “Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1)”

tor.anonymizer.ccc.de - - [13/Mar/2008:00:59:02 -0700] “GET /forum/forums/index.php HTTP/1.0″ 404 286 “http://forum.d-kriptik.com/forums/index.php” “Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1)”

tor.anonymizer.ccc.de - - [13/Mar/2008:00:59:08 -0700] “GET /forum/board/index.php HTTP/1.0″ 404 285 “http://forum.d-kriptik.com/board/index.php” “Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1)”

-

Like heat and humidity, free music and Central Park mean summer. The 2008 schedule is up at the Summer Stage web site. And, the opening Saturday looks good to me.

Vampire Weekend
Kid Sister
Ecstatic Sunshine

Saturday, June 14, 2008
From 4:00 PM to 7:30 PM
Central Park SummerStage

Quickies: smell, bot, book, wikipedia, moon

Tuesday, December 11th, 2007

I found this article interesting.

“The study suggests that people conscious of the barely noticeable scents were able to discount that sensory information and just evaluate the faces,” Li said. “It only was when smell sneaked in without being noticed that judgments about likeability were biased.”

In other words, awareness of the situation allows a person to adjust their response to suit the situation. There are two key elements at work here - being aware, and effectively using that awareness.

Anyway, this reminded me of Cialdini’s Influence. The attacks of influence are often carried out beneath the radar of the person being attacked. The attacker triggers automatic responses in the person to influence their decisions/behavior, and the actions that hit these triggers go unnoticed at a conscious level by the person being attack at the time of attack, which results in the person being attacked not properly recognizing the level of influence coming from the attacker. Once a person is aware of triggers and/or able recognize attempts to pull triggers, a person can work to mitigate the influence of triggers and/or the responses to triggers.

Side note, I always find this sort of thing interesting with regards to emotions and relationships. We all have emotional triggers, things that set off strong emotional responses. Learning to understand our triggers, and those of the people around us, can go a long way to having healthy, satisfying relationships. And, with such relationships, comes a great deal of our basic security.

-

Speaking of people, this article has been making the rounds.

The artificial intelligence of CyberLover’s automated chats is good enough that victims have a tough time distinguishing the “bot” from a real potential suitor, PC Tools said. The software can work quickly too, establishing up to 10 relationships in 30 minutes, PC Tools said. It compiles a report on every person it meets complete with name, contact information, and photos.

Ok, so, social engineering is nothing new, and love letters have flooded inboxes. But, it got me thinking for a second…

So, I often speak of using real people for people based attacks leveraging things like beauty and charm. However, since real people tend to be a scarce resource, we are quite limited in the number of attacks that can be carried out and, the less attacks we can carry out, the more important each particular attack becomes. For in person attacks, this people cost can reach extremes. On the other hand, if we go virtual, we can come up with all sorts of ways to farm out the people work to reduce its cost.

Coming back to the article at hand, as a potential way to combine this sort of bot and real people, perhaps a bot that bridged conversations serving as a middle man would be interesting. For example, the bot could hang out in multiple chat rooms or web forums, and cross connect conversations. Or, reply to Craigslist ads and link responders.

Of course, with none of the human participants likely to have the agenda of the attacker here, the conversations would probably have less of a chance of being useful to the attacker than result of automated scripts, even if you could effectively pull off the bridging. Oh well.

-

I remember mentioning cell phone tanka a while back. This takes it to a new level.

“I typed it all on my mobile phone,” Rin explains matter-of-factly over the same device. “I started writing novels on my mobile when I was in junior high school and I got really quick with my thumbs, so after a while it didn’t take so long. I never planned to be a novelist, if that’s what you’d call me, so I’m still quite shocked at how successful it’s turned out.”

[...]

Remarkably, half of Japan’s top-10 selling works of fiction in the first six months of the year were composed the same way - on the tiny handset of a mobile phone. They sold an average of 400,000 copies. By August, the president of Goma Books, Masayoshi Yoshino, was declaring in a manifesto that he was determined “to establish this not simply as a fad, but as a new kind of culture”.

My “waiting to be read” book queue is at ~20. As far as I know, none of these books were written on a mobile phone. I really have to get with the times.

-

When you build a technology based on community input and open communication in a medium that lets gossip circle the world at roughly the speed of light, you can’t expect to hide behind a curtain. And, beautifully, the end result is an open study in people, power, and paranoia, with a good helping of “trust me, it’s for your own good” arguments and “shoot yourself in the foot” phenomena.

A couple of choice excerpts from the article,

Meanwhile, Durova continued to insist that she had some sort of secret evidence that could only be viewed by the Arbitration Committee. “I am very confident my research will stand up to scrutiny,” she said. “I am equally confident that anything I say here will be parsed rather closely by some disruptive banned sockpuppeteers. If I open the door a little bit it’ll become a wedge issue as people ask for more information, and then some rather deep research techniques would be in jeopardy.”

And,

This sort of extreme paranoia has become the norm among the Wikipedia inner circle. There are a handful sites across the web that spend most of their bandwidth criticizing the Wikipedia elite - the leading example being Wikipedia Review (http://wikipediareview.com/) - and the ruling clique spends countless hours worrying that these critics are trying to infiltrate the encyclopedia itself.

Now, I partially pointed to this because I know my circles are always amused by this stuff. But, I also wanted to note this.

But he’s not admitting how deep this controversy goes. Wales and the Wikimedia Foudation came down hard on the editor who leaked Durova’s email. After it was posted to the public forum, the email was promptly “oversighted”
- i.e. permanently removed. Then this rogue editor posted it to his personal talk page, and a Wikimedia Foundation member not only oversighted the email again, but temporarily banned the editor.

It ain’t easy blowing whistles. Even in a supposedly open forum such as Wikipedia, the powers that be crack skulls. You know, silence the critics and keep them silent.

Here, that cracking of skulls is figurative. In other venues, it could be literal. Anonymity has its uses.

Oh, and in the conclusion of a related article, we have a good summary of what seems to be going on.

“Wikipedia, in its way, is of great benefit to the web community,” he says. “But I’ve also been greatly dismayed that Wikipedia has apparently attracted some intelligent but problematic personalities with ambition, secret personal agendas, and cold, ruthless behavior towards other editors and ideas that they perceive as threatening their power, position, or agendas. What’s disheartening is that Jimbo and the rest of the Wikimedia Foundation not only don’t do anything about it, but they appear to support these charlatans to some degree.”

-

I mentioned the contest previously. Well, here comes the first entrant.

The Google Lunar X-Prize folks held an event at a space investment conference in San Jose to announce their first fully-registered competitor.

Odyssey Moon, a startup based on the Isle of Man, and run by Carl Sagan mentee, Bob Richards and the CFO of  satellite-provider Inmarsat, Ramin Khadem, plans to land a rover on the moon within the next seven years.

Quickies: ossl fips prng seeding, privoxy, gcm, hash stuff, misc

Monday, December 3rd, 2007

Ouch.

A significant flaw in the PRNG implementation for the OpenSSL FIPS Object Module v1.1.1 (http://openssl.org/source/openssl-fips-1.1.1.tar.gz, FIPS 140-2 validation certificate #733, http://csrc.nist.gov/groups/STM/cmvp/documents/140-1/140val-all.htm#733) has been reported by Geoff Lowe of Secure Computing Corporation. Due to a coding error in the FIPS self-test the auto-seeding never takes place. That means that the PRNG key and seed used correspond to the last self-test. The FIPS PRNG gets additional seed data only from date-time information, so the generated random data is far more predictable than it should be, especially for the first few calls.

I updated this post accordingly.

[...]This means the PRNG is not reseeded after the KAT, so the PRNG ends up seeded with constant self-test values.

A couple of patches [1,2] are available for the OpenSSL FIPS module. The patches boil down to running FIPS_rand_method()->cleanup() after the PRNG KAT and then reseeding the PRNG.

-

In “related to Tor” news, this is a good write-up on recent vulnerabilities in what is often the default Privoxy configuration, including that shipped with the Tor bundle up until recently.

The installed ‘config.txt’ file (’config’ on Mac OS X) had the following option values set to 1:

  • enable-remote-toggle
  • enable-edit-actions

Additionally, on Windows the following option was set to 1:

  • enable-remote-http-toggle

Malicious sites (or malicious exit nodes) could include active content (e.g., JavaScript, Java, Flash) that caused the web browser to:

  • make requests through the proxy that causes Privoxy filtering to be bypassed or completely disabled>
  • establish a direct connection from the web browser to the local proxy and modify the user defined configuration values

It should be noted that these are not Tor specific attacks on Privoxy and you may want to disable these Privoxy configuration options even in non-Tor environments.

-

SP800-38D, specifying the GCM mode of operation, has been finalized.

Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC has been finalized. This Recommendation specifies and approves Galois/Counter Mode (GCM), an authenticated encryption mode of the Advanced Encryption Standard (AES) algorithm.

I remember superficially comparing GCM and CCM back a few years ago. Both seemed to have a push at NIST, but you knew CCM would go through the vetting process relatively quickly being a combined mode of what was already accepted while GCM would take a bit of time. Well, CCM has been approved for quite a while, and now GCM is finally there too.

-

These [1,2] have been making the rounds. More fun with MD5.

We announce two different Win32 executable files with different functionality but identical MD5 hash values. This shows that trust in MD5 as a tool for verifying software integrity, and as a hash function used in code signing, has become questionable.

We have used a Sony Playstation 3 to correctly predict the outcome of the 2008 US presidential elections. In order not to influence the voters we keep our prediction secret, but commit to it by publishing its cryptographic hash on this website. The document with the correct prediction and matching hash will be revealed after the elections.

-

Speaking of hashing, there is a mailing list for the NIST hash competition.

A hash-forum@nist.gov email mailing list has been established for dialogue regarding NIST’s Cryptographic Hash Workshops and Hash Algorithm Competition. It is an unmoderated mailing list; messages addressed to this list are immediately distributed to all the addresses on the list. Only members are allowed to post messages to the list; however, anyone who wishes to do so may add themselves to the list.

-

A location service by Google relying on cell towers to estimate your location when GPS is not available.

Why the uncertainty? The My Location feature takes information broadcast from mobile towers near you to approximate your current location on the map - it’s not GPS, but it comes pretty close (approximately 1000m close, on average). We’re still in beta, but we’re excited to launch this feature and are constantly working to improve our coverage and accuracy.

-

Finally, I found this somewhat interesting to me.

“The empirical fact is that people will often switch to strategies they never picked before. They couldn’t have learned these strategies by reinforcement” from experienced rewards, says Camerer. In these situations, people use imagined rewards, or rewards that could have been theirs, to guide their decision making. This process, called fictive learning, is similar to the emotion of regret. “Regret is essentially the bodily sensation or name we give to fictive learning when there was a better choice than the one we chose.”

data sets, walks, kdfs, banksy, odds

Wednesday, November 28th, 2007

I heard Lotus Cafe is gone. And, Rififi may not be happy either. Ah, old timers.

-

Say you have data set you wish to release for research purposes but you don’t want the individual people identified (e.g., a medical data base) and thus tied to particular sensitive attributes (e.g., medical conditions). So, you have this data set consisting of what you consider to be sensitive attributes (e.g., medical conditions) and non-sensitive attributes (e.g., social security number, date of birth, zip code). In order to anonymize the data, you strip out the attributes that serve as blatantly unique identifiers (e.g., names, social security numbers).

Now, there is an immediate issue here. The obvious identifiers have been removed, but lets say this data set also contains attributes (e.g., date of birth, zip code) that when linked to external information lead to the identification of particular individuals (e.g., there is only one person with a given birthday in a given zip code, and this information is readily available someone trying to identify that individual in the data set).

This is where k-anonymity comes in. Now, you have already identified the sensitive and non-sensitive attributes, and you have removed the outright identifiers. From the remaining set of non-sensitive attributes, you can then identify those attributes, which are called quasi-identifiers, that could be linked to individuals through correlation against external data sources.

Here we come to one of the key assumptions made in the k-anonymizing process - the sanitizer can identify the attributes in the data set that can be tied to other, external sources. Not only do they have to identify these attributes, but they also have to assess the level of resources required to use those attributes to penetrate the anonymity of the data set, and make judgment calls on modifying the data for anonymity and/or privacy versus the usefulness of the resulting data set. This is hard stuff.

You can see many issues with this assumption. Does the sanitizer really know what will be quasi-identifiers in a data set? Even if they do, are their judgments about the risk posed by those identifiers realistic? For example, the sanitizer might not know about various public records or even google. Another example, one may assume that only large governments would have access to name, date of birth, and zip code information for individuals. However, most of us have access to this information for at least out friends and family.

Anyway, once you have picked out the quasi-identifiers, you then ensure that at least k records in the data set possess the same set of values for quasi-identifiers (e.g., generalize the date of birth to be year of birth, such that at least 10 records turn up for all years of birth and zip code combinations). In other words, instead of being able to uniquely identify a unique record using particular values for the quasi-identifiers, one always ends up identifying a set of k records with any particular values for the quasi-identifiers (e.g., you pull up 10 records at least for every year of birth and zip code combination).

Important note for the paper that follows later in this post: You may be wondering, what happens when dealing with all these sparse data sets with long-tail distributions? For example, a particular attribute may have a large number of values that are unique or at least very minimally spread across records, which could mean huge impacts on the data set if these values were generalized or removed for k-anonymity purposes. Which means there may be a big trade-off between anonymity and/or privacy, and the usefulness of the data here, which will minimize k if not render k-anonymity infeasible in order to keep the data at all useful. And, it is notable that most transactions databases fall into this category (e.g., credit card records, amazon purchases).

But, say k-anonymity is reasonable for the data without too much loss of usefulness of the data. Great, that covers establishing the exact identity of records, but there is an issue here - we may still be able to tie sensitive attributes to individuals. For example, if the sensitive attributes you are trying to unlink from an individual are applicable to all of the k records pulled up by the quasi-identifiers (e.g., all 10 records have the same medical condition), then one can assume that the individual possesses this attribute even if they can’t uniquely identify which of the k records is that individual. Or, if one knows particular sensitive values do or do not apply to the individual, they can rule out those records, perhaps pinpointing the applicable value of the sensitive attribute for the individual concerned (e.g., the medical condition in the group is either a broken arm or severe depression, and one knows the individual does not have a broken arm).

Which leads to the concept of l-diversity, wherein a set of k records should have l number of values for sensitive attributes contained within it. Now, when one pulls up that set of k records, all of those records do not have the same sensitive values and, even if I know certain sensitive values to do/do not apply to the individual in question, there is potentially still a range of records with other values for sensitive attributes that are applicable, making it hard for me to establish exactly which values for the sensitive attributes apply to a particular individual.

l-diversity can result in the data set undergoing significant modifications. Like with k-anonymity, judgment calls must be made on privacy versus the usefulness of the data set.

But, say l-diversity is applicable to the data set without too much loss of usefulness of the data. Nice, but we may still have some semantic ties or non-uniformity that can be exploited. For example, if all the k records have some related values for the sensitive values (e.g., all the medical conditions were mental problems) while the overall data set covered a larger variety of values, then information is learned about the members of k. Or, if the sensitive attributes in k records have one of set of odds of having particular values (e.g., 75% of members have cancer), while in the overall data set the sensitive attributes have some set of odds (e.g., 10% of members have cancer), this can be used to reveal information about the set of k records distinguishable for the overall data set.

Which leads to t-closeness, wherein the distribution of values of sensitive attributes in a set of k records will mimic the distribution of those values across the whole data set within a range t.

t-closeness can result in the data set undergoing even major changes. Like with k-anonymity and l-diversity, judgment calls must be made on privacy versus the usefulness of the data set.

And so forth.

I typed out that summary because I just noted this paper interesting. [via what is left of the cypherpunks list]

We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary’s background knowledge.
We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.

A few paragraphs of note…

Micro-data are characterized by high dimensionality and sparsity. Informally, micro-data records contain many attributes, each of which can be viewed as a dimension (an attribute can be thought of as a column in a database schema). Sparsity means that a pair of random records are located far apart in the multi-dimensional space defined by the attributes. This sparsity is empirically well-established [6, 4, 16] and related to the “fat tail” phenomenon: individual transaction and preference records tend to include statistically rare attributes.

This applies to most of those real-world databases out there containing information about us, which is something to note when policies say that information that could identify you has been removed from a data set before that data set is distributed.

Our de-anonymization algorithms are designed to work against databases that have been anonymized and “sanitized” by their publishers. The three main sanitization methods are perturbation, generalization, and suppression [23, 8]. Furthermore, the data publisher may only release a (possibly non-uniform) sample of the database. For example, he may attempt to k-anonymize the records, and then release only one record out of each cluster of k or more records.
If the database is released for collaborative filtering or similar data mining purposes (as in the case of the Netflix Prize dataset), the “error” introduced by sanitization cannot be large, otherwise its utility will be lost. We make this precise in our analysis. Our definition of privacy breach (see below) allows the adversary to identify not just his target’s record, but any record as long as it is sufficiently similar (via Sim) to the target and can thus be used to determine its attributes with high probability.

The tradeoff between privacy and/or anonymity, and usefulness comes into play, and the authors make sure to take advantage of it. The real-world is a fun place.

Moreover, the linkage between an individual and her movie viewing history has implications for her future privacy. In network security, “forward secrecy” is important: even if the attacker manages to compromise a session key, this should not help him much in compromising the keys of future sessions. Similarly, one may state the “forward privacy” property: if someone’s privacy is breached (e.g., her anonymous online records have been linked to her real identity), future privacy breaches should not become easier. Now consider a Netflix subscriber Alice whose entire movie viewing history has been revealed. Even if in the future Alice creates a brand-new virtual identity (call her Ecila), Ecila will never be able to disclose any non-trivial information about the movies that she had rated within Netflix because any such information can be traced back to her real identity via the Netflix Prize dataset. In general, once any piece of data has been linked to a person’s real identity, any association between this data and a virtual identity breaks anonymity of the latter.

Anonymity tends to be a one way street. This can be particularly dangerous when it comes to persistent pseudonyms.

We have presented a de-anonymization methodology for multi-dimensional micro-data, and demonstrated its practical applicability by showing how to de-anonymize movie viewing records released in the Netflix Prize dataset. Our de-anonymization algorithm works under very general assumptions about the distribution from which the data are drawn, and is robust to perturbation and sanitization. Therefore, we expect that it can be successfully used against any large dataset containing anonymous multi-dimensional records such as individual transactions, preferences, and so on.
An interesting topic for future research is extracting social relationships, networks and clusters from the anonymous records. This knowledge can be a source of information for further de-anonymization [13]. In the case of the Netflix Prize dataset, de-anonymization of individual records may also have interesting implications for winning the Netflix Prize. We discuss this briefly in appendix B.

Data is recorded, filtered, and mined. You, your actions are not random. There will be non-uniformity, patterns. Identities and attributes will always appear.

I tend to think pseudonymity is the best you get in practice, and even that is quite difficult.

-

So, a while back I wrote this.

Side note, this is even more interesting in combination with things like the reality it generally takes couples trying to have a baby on the average of a few months to achieve pregnancy. Such drawn out periods seems to protect against a number of attacks, such as misrepresentation and quick judgement - for example, it exposes potential mates that seem good at first glance but aren’t quite so good once a closer look is provided.

Fitting right in here, I noted this.

However, Provost and her colleagues say there is in fact no contradiction between this research and other studies, as they are investigating two different kinds of signal. The previous research investigating men’s response to fertile women focused on signals such as smells and facial expressions, which can only be detected at close range. That makes evolutionary sense, as it would benefit a woman to advertise her fertility to a man that she has decided is worth having children with and has therefore allowed to get close to her.

In contrast, men can pick up on the attractiveness of a woman’s walk from long distance, and it can therefore act as an unwitting signal to less appealing males who she might not want to choose. So the advantage of having a less sexy walk around the time of ovulation becomes clear: it allows a woman to hide her fertile period from undesirable men who might take advantage of her at that time.

-

I noticed this request.

As many of you know, NIST has specified two standard KDFs for use with key agreement algorithms (e.g., Diffie-Hellman or MQV) in NIST SP 800-56A. NIST is considering supplementing the 800-56A KDFs with a more broadly applicable KDF. In particular, NIST is considering a proposal for an HMAC-based KDF. Before committing resources to this effort, we would like to get a better handle on the requirements seen by protocol developers and evaluate the level of support for such a standard. We would also like to identify alternative designs that should be considered.

PBKDF2 (something you may already use possibly without knowing it) and S2V were pointed to in replies.

-

Look at that, Banksy in New York.

VANINA HOLASEK GALLERY·502 West 27TH Street New York, NY 10001 T: 212-367-9093

FOR IMMEDIATE RELEASE:
BANKSY DOES NEW YORK
DECEMBER 2ND – DECEMBER 29TH, 2007
OPENING RECEPTION: SUNDAY, DECEMBER 2ND, 1 PM -5 PM

BANKROBBER GALLERY, London, in collaboration with VANINA HOLASEK GALLERY, are pleased to present for the first time in New York, an exhibition of works by Banksy, on view from December 2nd through December 29th, 2007.

There may even be a comment on security in here.

He’s the maniac who got on the news for managing to smuggle one of his pieces of art into Tate Britain and embarrassed everyone because nobody seemed to notice…He’s the wit behind the stencilled “Mind the Crap” writing that appeared overnight on the steps to Tate Modern. He is the prankster who smuggled 500 alternative copies of the Paris Hilton CD into record stores. He is the subversive who placed a life-size replica of a Guantanamo Bay detainee in Disneyland. He’s the jester who gave LA a painted elephant. He is the trickster whose hoax cave painting of a man pushing a supermarket trolley sat in the British Museum unnoticed for three days. He is the infiltrator who disguised as a pensioner hung his perfectly framed pieces in the Metropolitan, MOMA, Brooklyn Museum and his “dead beetle with glued on sidewinder missiles and satellite dish” had pride of place in the Museum of Natural History NYC. Get the picture, get this. Banksy images are even being used to sell 900k condos in Williamsburg.

Anyway, there is a party Saturday (2007-12-01) night from 6-9pm at the gallery celebrating the opening. [via thisheartsonfire]

-

Finally, here are some generic odds.

The National Safety Council has been compiling and reporting on injury data every year since the 1920s. The table below was prepared in response to frequent inquiries to the Council concerning the odds of dying from or being killed by a specific incident or occurrence such as a lightning strike or a plane crash.

The odds given below are statistical averages over the whole U.S. population and do not necessarily reflect the chances of death for a particular person from a particular external cause. Any individual’s odds of dying from various external causes are affected by the activities in which they participate, where they live and drive, what kind of work they do, and other factors.

Quick notes on Ubuntu Gutsy with Xen and djbdns, phishing lesson, misc

Thursday, November 8th, 2007

Just when you started thinking I was beginning to clean up my messy posting habits, I went and did this…

-

So, I decided to migrate my Tor server, etc. and thought it would be nice to upgrade to Ubuntu Gutsy in the process. (I also took this as an opportunity to setup disk encryption, which was quick and easy.)

As part of the effort, I rebuilt the Xen guest domains I was using on this server. This part turned out to have some quirkiness, as my Xen dom-0 and dom-Us running Ubuntu Gutsy (7.10) would not place nice under the Xen 3.1 that was installed from the latest Ubuntu Gutsy Xen packages (2.6.22-14-xen kernel). By not place nice, I mean the guests would hang during the boot process and/or not provide a usable console.

So, for my reference and yours, I figured it good to point out where I found fixes/workarounds for these issues with Xen 3.1 (2.6.22-14-xen kernel) and Ubuntu Gutsy (7.10) (used for both the host and guests) - this link is where I found some guidance to help fix the issue.

In particular, I found this, which led me to copy “etc/event.d/tty1″ to “etc/event.d/xvc0″ and then replace all occurences of “tty1″ with “xvc0″ within “etc/event.d/xvc0″, useful and it worked across a couple of running dom-U’s. Alternatively, this seems like it might be a workaround, although I did not use it myself.

I found this, which led me to remove offending “hwclock” entries, took care of some hang time.

Also, I did this, which led me to replace “sda” with “xvda” in my guest’s Xen configuration and its “etc/fstab”, just to continue following the general direction Xen is going, although it did not fix any issues.

-

I decided to use djbdns on a dns cache server. As I was setting this up in one of my newly create Xen virtual machines, I found these instructions useful (minus the small part about tinydns, as I wanted a dns cache service - dnscache was my concern, e.g., dnscache); however, I did note one issue on my Ubuntu Gutsy install with regards to the contents of “etc/event.d/svscan” conveyed in those instructions - the use of “runlevel-” was incorrect.

In other words, under Ubuntu Gutsy (7.10), the “etc/event.d/svscan” contents should be something like what follows.

# svscan - daemontools — http://www.froyn.net/blosxom/blosxom.cgi/2007/1/12
#

# This service maintains an svscan process from the point the system is
# started until it is shut down again.

start on runlevel 2
start on runlevel 3
start on runlevel 4
start on runlevel 5

respawn
exec /command/svscanboot

-

I briefly noted this paper on training user’s about phishing.

Our embedded training system works roughly as follows.
People are periodically sent training emails, perhaps from
their system administrator or from a training company.
These training emails look just like phishing emails, urging
people to go to some website and log in. If people fall for
the training email and click on a link in that email, we
provide an intervention that explains that they are at risk for
phishing attacks and gives some tips for protecting
themselves.

Any manager that has ever had to train (or discipline) an employee can probably relate to some of these lessons learned.

· Embed the training into users’ regular activities so they do not have to go to a separate website to learn about phishing attacks.
· Make it clear why users are being warned—for example, what the risks are and what caused the warning.
· Do not delay the warnings; present them immediately after the user clicks on the link.
· Use training messages with the same content that users have just seen, as this helps them concretely relate to what is being discussed in the training message.
· Supplement training text with story-based graphics and annotations.
· Keep the training messages simple and short. One reason the security notices did not work well was too much text.
· Give clear actionable items that participants can easily do to protect themselves.

The “Embed the training into users’ regular activities” reminded me of what I discussed here in quite some length.

Now, lets look at a much better example. In preparation for security training, you have someone sit outside your organization auditing whether people were taking off their ID badges before leaving work, as was mandatory. As part of this audit, the auditor photographs the ID badge of someone leaving the offices still still wearing their ID badge. The next day, that person comes to work to find you sitting at their desk wearing an ID badge with their name but your face. Now, that would make an impact.

However, the actual implementation used in the paper seemed to lack the impact of my example, in that, the people in the study were asked to play someone else, which separated those people from the actions being taken and the consequences of those actions. There was no real world experience here to drive the lessons home.

Oh, and here is our old phishing lesson.

And, what techniques did we employ?

  • Wariness - Email should not be trusted by default. Examine email messages closely, especially if they request sensitive information, contain attachments, or provide links, and be sure to establish trust before performing any actions requested by the messages. This is just how I think, and it helps in avoiding scams.
  • Research - When in doubt, do some research. (An easy way to do this for messages claiming to be from a company or person you deal with is just call the company or person.) By looking at the raw message, I could see weird characteristics in the headers and the message body that indicated this was a fraud. I then used information taken from the headers and message body to identify proper abuse contacts at the relevant ISPs.
  • And the easiest one…

  • Multiple email accounts with dedicated purposes - By having specialized email accounts that are used only for certain purposes, many scam messages can be quickly identified just by being out of place. (You can also track the dissemination of your email addresses quite well by doing this.) That is how I knew this message was a scam before I even read it its contents.

-

Finally, UAVs have been getting a little attention in my circles for quite a while. So, this article might interest some of you.

Having evolved from military use, drones, or unmanned aerial vehicles (UAVs), are taking to the air in increasing numbers for public-service and civilian roles. They are being operated by groups as diverse as police, surveyors and archaeologists. A UAV helped firemen track the blaze that recently ravaged southern California. [...]

Reminded me of some of the things we did with model airplanes back in the day.

Quickies - social, rngs, remail, moon

Friday, September 28th, 2007

Time for some quickies…

This short article seemed in line with some of my ramblings here (e.g., this and this), albeit quite sparse.

Enterprises must invest more heavily in staff training and social engineering tests to ensure corporate data cannot be compromised by outsiders who trick their way into the company, according to experts at this years ISSE event in Warsaw.

I discussed training and testing in a long ramble, where I sort of combined the training and testing into one service. I thought along the lines of utilizing testing to not only monitor the effectiveness security efforts (including training) and to identify security strengths and weakness, but also to help tailor the training to a particular organization based on real feedback and to provide relevant examples during the training that have a strong impact on attendees of training. The third part part of this ramble covers these thoughts in a messy fashion.

Now, pull the results of these attacks into your security training. Will there be an impact?

Well, I think so. You don’t quickly forget seeing yourself and/or those around you up in lights, as it were, and the attacks can certainly be used to increase the sense of responsibility felt by every employee. The demonstrations hit home because they can be related to - the attacks happened to you, your neighbors, your community. Whether the attacks succeed or fail, they amount to a shared experience for the organization, and teach people their importance to the security of their organization.

The next paragraph of that article goes on to say…

Sharon Conheady, a consultant in social engineering for consultancy Ernst & Young, explained that the scale of the problem is often underestimated by firms, because many are unaware it is even going on. She revealed criminals are using tools such as Google and company web sites to research and gather information about a particular firm, before conning their way into the building with the aim of stealing sensitive data.

I talked about this sort of intelligence gathering back a few months ago in this post.

Mapping out organizations is often one of the tools used in social engineering, and there is a wealth of information to be gathered from OSINT and HUMINT. For example, when you can talk the organizational lingo, it is much easier to convince people within that organization you can be trusted.

Of course, while I like playing with in person attacks (with a bit of a focus on beauty) and would wager that conning one’s way into many a building is quite trivial, physically entering a place and/or interacting with people can be a high risk game. As such, phone calls, email/IM, etc. might be the more popular, safer mediums here for most attackers.

So, I’d be curious to know the “scale of the problem” out there in the real world. I know I play with this in some ways, but not enough to have generalizable, concrete numbers. Perhaps the presentations on which this article was based had some detail here.

In any case, I think the testing and training mentioned here can be applied equally well.

(That reminds me - just entering my reading queue, “Choices, Values, and Frames” edited by Daniel Kahneman and Amos Tversky.)

-

Since lots of FIPS people visit this blog, I thought pointing out this paper might be useful, which looks at the security properties of the underlying primitives utilized in the PRNGs recommended by NIST in SP 800-90 in relation to the security of the output of the said PRNGs. Their conclusions are as follows.

In general, block cipher DRBG should not be used in any circumstances. If block cipher DRBG is currently implemented, then, if possible, one should change the DRBG immediately; otherwise, one should be certain that the outputs generated are as short as possible relative to the output block size of the block cipher used. The other three DRBGs are secured under the current knowledge we have about the underlying mechanisms. Elliptic curve DRBG is the most computationally expensive DRBG. However, it is also the one that is the best understood. ECDLP has been researched for many decades and is still believed to be a hard problem, whereas many hash functions are failing collision resistance as time passes. Two major advantages elliptic curve have over hash or HMAC based DRBG are that the maximum length of output is significantly greater and that it is a number theoretic based instead of heuristic based. If elliptic curve computations can be done as efficiently as hashing (via improve algorithms or hardware), then elliptic curve DRBG is currently the best DRBG out of the recommendations from NIST.
If elliptic curve DRBG is implemented, it is necessary to not follow the NIST generation process exactly due to the poor choice of truncation function1. Refer to Appendix B of [4] for more detail on TPP.
A few other cares must be taken before a DRBG is implemented. See Appendix C for a brief discussion on additional aspects to consider.

FYI, I think the block cipher based PRNGs (namely the ANSI X9.31 Appendix A.2.4 PRNG and its NIST derivatives) are the most commonly implemented in PRNGs in the FIPS world due to their simplicity and open source availability. A quick scan of this list might confirm or deny that.

Update: I haven’t directly dealt with modules that implement the EC based PRNG recommended in 800-90, but, before you do undertake such an implementation, you may want to take note of these slides. [via Schneier commentary]

On the Possibility of a Back Door in the NIST SP800-90 Dual Ec Prng

Conclusion
• WHAT WE ARE NOT SAYING:
NIST intentionally put a back door in this PRNG
• WHAT WE ARE SAYING:
The prediction resistance of this PRNG (as presented in NIST SP800-90) is dependent on solving one instance of the elliptic curve discrete log problem.
(And we do not know if the algorithm designer knew this before hand.)


-

I’ve mentioned remailers on here, so I thought this might interest some of you. It sounds like the mixmaster and mixminion remailer networks have suffered from a bit of a flood in traffic, malicious or otherwise.

The Mixmaster flood we saw in the last two weeks
is unprecedented, as volume of mail that flow
*successfully* throught the remailer.
Most trafficked ones remailers a tenfold raise of their
normal traffic for days; George peaked as 60K message
a day.

Nothing major it seems, but it did reinforce the need for a new release of mixminion.

Mixminion 0.0.8alpha3 is now available. It fixes a bug that crashed some servers over the last month; if you’re running a server, you should upgrade. There are some other small bugs fixed too.

Anyway, I thought these words important.

In the mean while, if people can post stats about the flood, and if someone wants to port my timestamp hack to Mixminion and start recording the rate at which these messages are arriving and share that data, it may help. (Just don’t do any kind of logging that could reveal identity/break anonymity. Let’s hold remop ethics to a higher standard than some recent Tor operators have shown.)[1]

-

Finally, I found this a bit interesting.

CAPE CANAVERAL, Florida (Reuters) - Web search leader Google Inc. will sponsor a $30 million competition for an unmanned lunar landing, following up on the $10 million Ansari X Prize that spurred a private sector race to space.

Of course, I must point out Koman’s Kings Of The High Frontier is sitting on my bookshelf.

Tor yet again

Thursday, September 13th, 2007

Ok, so recently this surfaced and caused a little stir in the Tor world.

Last month, Swedish security specialist Dan Egerstad exposed the passwords and login information for 100 e-mail accounts on embassy and government servers. In a blog entry today, Egerstad disclosed his methodology. He collected the information by running a specialized packet sniffer on five Tor exit nodes operated by his organization, Deranged Security.

This has been discussed here in some form before, and I commented on the trustworthiness of nodes in another form before that.

While there is often logic thrown about that the larger the number of nodes in a mixnet, the stronger the anonymity, such logic often leaves out one key point, the honesty of the nodes. There is a reason remailer networks have traditionally been very small and (at least partially) maintained by some “trusted” nyms. Such honesty problems are not trivial to solve.

So, the real questions here revolve around how well Tor can be made to scale while still achieving its anonymity goals and remaining usable. Increasing popularity not only attracts the attention of more adversaries, but it also means the Tor network itself must grow. With such strong honesty requirements for certain nodes in the network, this growth could pose some particularly interesting problems. (Of course, I am biased in that I always seem to find questions of reputation interesting.)

Once your traffic leaves the Tor network through an exit node, it is no longer, well, in the Tor network. This means the traffic is visible to the exit node (and anyone it passes through from there) without any of the encryption that was utilized as the traffic flowed through the Tor network. If the traffic is plaintext, then the exit node can easily snoop on that traffic. If the traffic is encrypted (say by being piped through an SSL or SSH session), then the exit node could attempt a MITM attack on that traffic, and, especially with regards to the average person using a web browser, many times a user will just ignore any warning dialogs that pop up indicating such an attack.

In this regard, a post was written to Practical Security asking what can be done to mitigate this risk with regards stealing pseudonyms for WWW discussion forums without SSL. I said the following.

1. Use pgp to digitally sign all forum posts. Then your pseudonym is tied to a pgp key pair and not just an account provided by the WWW forum software. If your forum account gets compromised, move on and just keep signing with the same private key - you may even be able to use digital signatures to convince the forum mods to help you here.

2. Rather than utilize rogue nodes when posting to a vulnerable WWW forum, avoid rogue nodes by setting options in torrc (e.g., configuring a set of ExitNodes and turning on StrictExitNodes) to only use particular exit nodes that you deem trustworthy enough not to snoop and such. The lower risk of snooping might be worth the possibly lesser anonymity.

Side note: As another option for performing 2 above, I just noticed the following in here

- If a user tries to connect to or resolve a hostname of the form <target>.<servername>.exit, the request is rewritten to a request for <target>, and the request is only supported by the exit whose nickname or fingerprint is <servername>.

Now, thinking along the lines of 2 above, I thought it might be nice to excerpt part of a discussion I had a while back.

Trust requirements are not new to mixnets, but the performance objectives of Tor and its large, dynamic network make the issue all the more challenging. AFAIK (and I really don’t know), most P2P metrics can really tell you nothing about the anonymity intentions of a mixnet node - they can cover things like performance or fairness, but you need something like, say, automated open source intelligence gathering tied in with some sort of behavior modeling or, much much simpler, people
intelligence (e.g., the Tor directory server operators), to build a “trust” metric for a mixnet node.

And,

From my take, the core issue here is that the Tor network relies on its nodes (including their operators) to be trustworthy; however, the network currently has no scalable metrics that define and speak to that trust. New nodes are essentially just accepted by the directory servers. From there, performance metrics seem to be used to rank a node’s trustworthiness, if you consider, say, being potentially selected as a guard basically a statement of trust about a node.

My comments on performance being the basis of trust for nodes partially came from a version of this. E.g.,

When a router posts a signed descriptor to a directory authority, the authority first checks whether it is well-formed and correctly self-signed. If it is, the authority next verifies that the nickname question is already assigned to a router with a different public key. Finally, the authority MAY check that the router is not blacklisted because of its key, IP, or another reason.

“Stable” — A router is ‘Stable’ if it is active, and either its Weighted MTBF is at least the median for known active routers or its Weighted MTBF is at least 10 days. Routers are never called Stable if they are running a version of Tor known to drop circuits stupidly. (0.1.1.10-alpha through 0.1.1.16-rc are stupid this way.)

To calculate weighted MTBF, compute the weighted mean of the lengths of all intervals when the router was observed to be up, weighting intervals by $\alpha^n$, where $n$ is the amount of time that has passed since the interval ended, and $\alpha$ is chosen so that measurements over approximately one month old no longer influence the weighted MTBF much.

“Fast” — A router is ‘Fast’ if it is active, and its bandwidth is either in the top 7/8ths for known active routers or at least 100KB/s.

“Guard” — A router is a possible ‘Guard’ if it is ‘Stable’ and its bandwidth is either at least the median for known active routers or at least 250KB/s. If the total bandwidth of active non-BadExit Exit servers is less than one third of the total bandwidth of all active servers, no Exit is listed as a Guard.

So, this discussion went on to convey the following, which is the meat of this post.

You know, I use certain remailers in a chain because I know, or know of, the operators, and I trust them to run a remailer as it should be run. The same goes for my acceptance of remailers in general or Tor.

Which is to say, the reputations certain people have built with me is what I use as a base for my decisions to employ their software and services. Such people’s reputations come from many sources for me. For example… For the people I know, reputations may have been built from all the things that make me say they are people I know, such as debating some aspect of life while sucking down alcohol in a bar together. For the people I know of, reputation may have been built from references from people I know or know of. For both people I know and people I know of, reputation may have been built from longer term exposure to papers, presentations, conversations, discussions, etc. in the appropriate forums. This probably maps to how current directory server operators make their decisions about Tor nodes, and all of these can be mapped to features of a decent social networking service, which is potentially scalable.

Taking a hint from the whole web 2.0 bang, we find lots of people that have links with lots of other people, and the number and quality of such links influences how much value we give to peoples’ opinions. The quality of feedback influences the reputation of the person providing the feedback, which in turn translates to how we value that person’s feedback, which is to say, how much that feedback effects the reputation of the thing being commented on. And so on. This people intelligence that gathers and mines data can lead to a large set of very interesting data, which can then undergo automated analysis. Now, forget web 2.0 though and go old school, the whole PGP web of trust is built on this type of thing.

You could see a Tor social network working in a similar way. People input followed by automated analysis could be the end result. For example, perhaps a system could be created such that node (i.e., node operators) are able to make trust statements about other nodes. The set of statements would be predefined, such as distrust, no trust, marginal trust, or full trust, and have guidelines that node operators should follow to make a determination of trust in a node. The weight of a node’s statements of trust about other nodes could be based on things like that node’s trustworthiness as determined by statements of trust made by other nodes about that node.

With some such metric in place, interesting decisions about a user’s path through the Tor network could be made. The obvious example is requiring specific trust levels for particular places in a user’s path through the Tor network, such as the entry and exit requiring highly trusted nodes but middleman only requiring marginally trusted nodes. As a more interesting example, I imagine a reputation web like that described would lead to clusters of closely linked nodes. Perhaps it would make sense to choose a path through the Tor network that picks nodes from different clusters. Or, if I have made trust statements about the nodes, it might make sense to heavily weight those statements in choosing a path for me - for example, if I fully trust a certain set of nodes, maybe my path through the network should be selected from the set of nodes I fully trust, nodes those nodes fully trust, and so on until there is a “large enough” set of nodes to choose from.

Finally, what I said here was pointed out to me.

In other words, right now, Tor uses the vigilance of people to ward off this type of attack. Such an approach might not scale well, but it works at the moment.

I don’t know if any of this scales well either or if it does more harm than good, but it is an idea nonetheless.

Oh, and for those asking about my Tor node. You know that lovely new motherboard? Well, it appears to have choked.

Tor and Xen and such, Flash BIOS

Tuesday, May 29th, 2007

So, I setup a Tor server in a Xen virtual machine on Ubuntu running on that J7F4-based system over the weekend. Right now, no custom kernel and no crypto hardware acceleration, but that will change when I have some more time.

The general setup was easy enough. I had one primary disk for booting the box’s OS (or dom0 when booting for Xen), and then a separate disk array for data (and domU’s). I used LVM to slice up the disk array.

General setup process:

  • Installed Ubuntu on the primary disk.
  • Installed Xen from the Debian/Ubuntu packages.
  • Used LVM to create volumes for root (512MB), swap (512MB), and var (2048MB) for my VM to run Tor.
  • Used mkfs (and mkswap) to pop the fs on the volumes.
  • Used debootstrap to install the base Ubuntu feisty OS files on the root volume I created for the VM to run Tor.
  • Created a tor VM configuration. Gave it 128MB of RAM, and one ethernet device with static MAC. The three volumes I created above are configured as ioemulated disks. Used root volume as root device and left it writable for now.
  • Mounted root volume. Modified basic system configuration installed by debootstrap in the VM’s root volume. Modified fstab properly mount the ioemulated disks configured in the tor VM configuration - root volume at /, the swap volume as swap, and the var volume at /var. Setup networking.
  • Rebooted system, booting from Xen kernel upon startup.
  • Booted the VM and logged in.
  • Downloaded and installed Tor (the Tor developers provide Ubuntu packages) in the VM.
  • Configured Tor to be a middle-man node (not willing to run exit as this location). Setup iptables here too, but not really worth the effort - only Tor daemon was listening for network connections, and you basically had to allow all TCP out.
  • Once all was working, halted the VM. With VM halted, changed the VM root device in the Tor VM configuration to be read-only, as only swap and var needed to be writable.
  • Restarted VM and was good to go.

I found this page useful as providing some bare minimum for setting up Xen with Ubuntu (note: I didn’t follow this setup exactly), and the Xen user manual useful for everything else. I found this FAQ useful for Tor server configuration. I found this page useful for iptables, something I hadn’t played with in quite some time (we have pf in the BSD world).

Next steps:

  • Build custom kernels for Dom0 and DomU. Be sure to support Padlock.
  • Enable Padlock support in OpenSSL.

Possible other next steps:

  • Wrap local network access of Tor client functionality with stunnel.
  • Use encrypted swap for Tor VM.
  • Store var volume used by Tor VM encrypted on disk, then rekey Tor node.

Annoying issue:

Only 512 megs of RAM is detected by the BIOS, when I actually have 1 gig installed, the maximum supported by the chipset. The RAM looks good and the board looks good, so I guess there is a compatibility issue. My plan was to run just two VMs anyway, but now the memory allocated to them is a bit smaller than expected.

Anyone out there running have J7F4 series board with the RAM maxed out? Did you experience any quirks?

Update: I never followed up on this. The compatibility issue I noted turned into a known issue with memory density. This sums it up.

[...]You need 16-chip (64M) low density memory, as the cn700 won’t support the 8-chip hi density (128M) memory. You’ll
just see half the memory otherwise.[...]

Unfortunately the 128Mx64 specification is essentially useless, as it’s applied to both two-rank 16-chip and one-rank 8-chip 1GB sticks.

If you have a stick with 8 chips on one side only, then that’s 1Gbit technology and it won’t work with a CN700. If you have a two-sided 16-chip stick then most likely it will.

-

As part of my attempts to get the full amount RAM recognized, I flashed the BIOS on the system discussed above. This system was running Ubuntu GNU/Linux and had no floppy drive. The general process I used to flash the BIOS was as follows, which was based on the steps outlined in this and this.

  • Download a bootable floppy image for FreeDOS. (e.g., fdboot.img is what I downloaded.)
  • Mount the image. (e.g., mount -o loop /home/flash/fdboot.img /home/flash/tmp)
  • Copy the “flasher” and flash to the image. fdboot.img had about 1 meg of free space, IIRC. (e.g., cp <flasher>.exe /home/flash/tmp; cp <flash>.bin /home/flash/tmp)
  • Unmount the image. (e.g., umount /home/flash/tmp)
  • Create an bootable ISO image, which will use fdboot.img for booting. Make sure fdboot.img is in the directory tree the image is created from. (e.g., cp /home/flash/fdboot.img /home/flash/tmp; mkisofs -r -b fdboot.img -c boot.cat -o /home/flash/bootcd.iso /home/flash/tmp)
  • Burn to CD.
  • Boot from the CD in the system to be flashed.
  • Choose option 2, which is FreeDOS safe mode. No drivers and such are loaded.
  • Flash the BIOS. (Of note, my BIOS was write-protected. This involved changing CMOS settings, but could easily have been jumpers.)
  • Remove the CD and reboot.

Kids and their identities

Saturday, April 28th, 2007

I found this study interesting because it is essentially a long discussion on managing identity. While it is in the context of teenagers, I think this discussion applies to most anyone and everyone interacting in the online world. Also of note, this paper brings up how parents do pay attention to what their kids do online and that kids do utilize the features of the online world to provide a physical safety buffer.

Anyway, a couple of choice paragraphs, just because they tie in with recent postings.

This differential between the sexes was reinforced by comments from our focus groups. When teens, particularly girls, talked about protection of their privacy online, their main concern was the protection of their physical self - if a piece of information could easily lead to them being contacted in person, girls would not share it readily. A middle school girl explains “If they can access you, like person to person or in any way other than [the internet], it’s not okay…Like if they can…talk to you, if they can find out where you live, that’s not okay. If you’re putting anyone in danger, it’s not all right.” But for modes of communication that were not physical or “real world,” girls were more likely to share information of that type.

Ok, so pseudonyms can provide some level of freedom from physical intimidation, and even kids get this. Good to know.

Studies of child victimization have shown that incidences of sexual abuse, physical abuse and other forms of maltreatment have been declining since the early 1990’s.3 Research has also shown that acquaintances and family members are responsible for most of the physical crimes committed against children.[...]

Translation, insiders pose the overwhelming threat to kids (and not random people on the Internet), but kids are generally safer today than they were 15 years ago. No shock there. (This came up in the context of Tor recently.)

Which makes me start thinking of insider risks. While we generally speak of these in terms of companies and governments, they apply to any organization, including families. For example, look at this discussion and think how it maps to the preceding.

Staff employees pose perhaps the greatest risk in terms of access and potential damage to critical information systems. As vetted members of the organization, employees are in a position of trust and are expected to have a vested interest in the productivity and success of the group. Considered “members of the family,” they are often above suspicion-the last to be considered when systems malfunction or fail.

This makes me wonder something. Since most attacks on kids come from insiders and in a very social context that perhaps has less reporting rules, whistleblowing and outside help seem particularly important. Here, the Internet seems to be a powerful tool, providing access to a wealth of information and a means of communication. Which brings many questions to mind, for example, in the realm of anonymity and pseudonymity… How often do victims utilize anonymity and pseudonymity in order to call attention to or get help for their situations? Do the initial steps towards breaking out of such a world involve anonymously reaching out to the outside world for help, such as researching ways to escape from abusive relationships?

And, now some idealism…

So much has been made of the evils of the Internet, and, in particular, the anonymity and pseudonymity provided by the Internet. “Mean” people willing to pounce on anyone and everyone while hiding behind pseudonyms. Pedophiles lurking around every corner. “Dangerous” information available at one’s fingertips. Criminals able to hide from law enforcement. And so on.

But, how scary is the Internet really? I’d say that pseudonym of yours just might be a great leveler of the playing field. Take away the physical threats and suddenly the world seems a lot less scary a place. Take away the source of our basic biases (e.g., looks, sex, age, race, etc.) and suddenly the world seems a lot more focused on skills, reputations, etc. Able to make our own choices about what information we access and with whom we associate (indiscriminate of physical borders) and suddenly we are all that much more freer to live our lives.

With these kids growing up in a pseudonymous world and, by virtue, gaining a strong understanding of identity, what we really need to do now is provide such people much more control over their identities. And, to do so, we need the capability of strong pseudonymity all over the place. Which helps set the stage for a much more cypherpunkly world, even if it is a decade or so later than some expected. HA!

Back to reality…

As I write this, e-gold is being smacked down.