educate me

Sunday, April 03, 2016

One Day Builds: Phase Vocoder

Introduction:
I'm a fan of Adam Savage's "One Day Builds" over at Tested.com [1], and often binge-watch multiple episodes on weekend mornings. I realized that I have a habit of my own "one day builds" in the computer-coding world, and figured I'd go ahead and write the phase vocoder I'd been thinking about, and catalogue the journey for the benefit of whomever...

Motivation:
I have some students working on a project to better understand how pitch-correction utilities such as Melodyne or AutoTune work. From some cursory checking around on the internet [1]-[3], it seems that the "phase vocoder" algorithm is the bread-and-butter of these sorts of utilities, the basics of which are not hard to implement but the fine points of "making it sound good" are where the "special proprietary sauce" comes in.

Purpose:
"Why write your own code, when existing utilities already do this?" Why build anything of your own? There is a joy in making. Furthermore, I find that building my own really helps me to understand the workings of something in a way which merely reading about it can't. An example of this is the compressor model I built [4], which in the process of building, cleared up many of my misperceptions about how compressors work.

Standards:
I don't own Melodyne, but I do have Logic and Pro Tools, both of which do a "great" job of time-stretching and-or pitch-correction. "Great" in comparison to what I'm likely to build. Also, there is the vocoder of the librosa library [5] of Python utilities from IRCAM which I'll be using to test against. And yes, we'll be using Python. It's not hard to find MATLAB code out there for phase vocoding [6], but again, I don't have MATLAB. If I stick with Python, I get a host of sound libraries & FFT libraries, and could even include it in my SHAART utility [7]. So, I intend to compare my results from librosa's; I expect librosa will produce cleaner results but I'm curious how close I can get.

Procedure:
The phase vocoder algorithm is a combination of time-stretching and resampling, which can achieve either time-stretching by itself, or pitch-shifting by itself, or both together. We'll do the time-stretching first, and then pitch-shifting can be achieved by resampling the time-stretched signal.
We're going to read in a WAV, put in a Short-Time Fourier Transform (STFT), and then regenerate the time-series data using an Inverse STFT (ISTFT), only adding some extra spacing.

For the STFT, we'll turn to the wondrous place that is StackExchange, and check out user Basj's code [9]...

def basj_stft(x, fftsize=2048, overlap=4):   
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]      # better reconstruction with this trick +1)[:-1]  
    return np.array([np.fft.rfft(w*x[i:i+fftsize]) for i in range(0, len(x)-fftsize, hop)])

def basj_istft(X, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros(X.shape[0]*hop) 
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = np.where( wsum != 0 )
    x[pos] /= wsum[pos]
    return x

Ok, so all we'll do now is modify the Inverse Short Time Fourier Transform (ISTFT) so that spaces out it resynthesis frames in time...

def my_istft(X, scale_t, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = int( fftsize / overlap * scale_t )
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros( int( X.shape[0]*hop*scale_t) )
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = np.where( wsum != 0 )
    x[pos] /= wsum[pos]
    return x

With all that said, here's our first time-stretching code, and believe me it sounds TERRIBLE...

#!/usr/bin/env python
# vocoder.py: Scale an input waveform by amounts in time & frequency (independently)

import sys
import scipy.io.wavfile as wavfile
import scipy.signal as signal
import scipy
import numpy as np
import pylab as pl
import math

# stft and istft code take from user Basj's post at http://stackoverflow.com/a/20409020
def basj_stft(x, fftsize=2048, overlap=4):   
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]      # better reconstruction with this trick +1)[:-1]  
    return np.array([np.fft.rfft(w*x[i:i+fftsize]) for i in range(0, len(x)-fftsize, hop)])

def basj_istft(X, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros(X.shape[0]*hop) 
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = np.where( wsum != 0 )
    x[pos] /= wsum[pos]
    return x


def my_istft(X, scale_t, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = int( fftsize / overlap * scale_t )
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros( int( X.shape[0]*hop*scale_t) )
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = np.where( wsum != 0 )
    x[pos] /= wsum[pos]
    return x


#----------- parse command line arguments and read wav file into memory -----
if len(sys.argv) < 4:
    sys.exit('Usage: %s <wav_filename> <time_scaling> <freq_scaling>' % sys.argv[0])
wav_filename = sys.argv[1]
scale_t = np.float( sys.argv[2] )
scale_f = np.float( sys.argv[3] )

print "reading file ",wav_filename,"..."
samplerate, data = wavfile.read(wav_filename)
if (len(data.shape) > 1):    # take left channel of stereo track
    data = data[:,0]

#convert integer data to floats, and scale by max val
sig = 1.0*data
maxval = np.amax(sig)
print "maxval = ",maxval
sig = sig / maxval
#--------------------- end of preparing input file -------------------- 

 

sigSTFT = basj_stft( sig )
sig2 = my_istft( sigSTFT, scale_t );
 

#save output file
newname = wav_filename.replace(".wav","_out.wav")
print "saving output file ",newname
wavfile.write(newname,samplerate,sig2)

The output sounds really "phasey." A sample Led Zeppelin clip shifted in frequency by a multiplicative factor of 1.5 (by running "./vocoder.py theocean_short.wav 1.0 1.5") yields this audio clip.

The above code is a "vocoder" but it's not a "phase vocoder" because we didn't make any attempts to make the phase line up between the resynthesized frames.

At this point it's probably worth it to try out the librosa example to hear a 'good' phase vocoder. So we'll add the following bits of code in the appropriate place(s)...

import librosa

vocoder_type = 'librosa'   # 'mine' or 'librosa'

if ('librosa' == vocoder_type):
    sig2 = librosa.effects.time_stretch(sig, 1.0/scale_t)  # time_stretch actually time-shrinks!
    half_steps = 12 * np.log2( scale_f )
    sig2 = librosa.effects.pitch_shift(sig2, samplerate, n_steps=half_steps)

else:
    #do all my stuff

...Yea that works great. As expected.

Turns out that it's better to stretch the STFT in time first and fill in the missing parts via interpolation as described in the aforementioned MATLAB code [7]. With that, we get much cleaner results, provided we take care to increment the phase. (This is the phase part of "phase vocoder".)

In the following "full" piece of code, we can run my original vocoder, librosa's version, or my "new" vocoder:

#!/usr/bin/env python
# vocoder.py: Scale an input waveform by amounts in time & frequency (independently)

import sys
import scipy.io.wavfile as wavfile
import scipy.signal as signal
import scipy
import numpy as np
import pylab as pl
import math
import librosa


# stft and istft code take from user Basj's post at http://stackoverflow.com/a/20409020
def basj_stft(x, fftsize=2048, overlap=4):   
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]      # better reconstruction with this trick +1)[:-1]  
    return np.array([np.fft.rfft(w*x[i:i+fftsize]) for i in range(0, len(x)-fftsize, hop)])

def basj_istft(X, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros(X.shape[0]*hop) 
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = np.where( wsum != 0 )
    x[pos] /= wsum[pos]
    return x


def my_istft(X, scale_t, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = int( fftsize / overlap * scale_t )
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros( int( X.shape[0]*hop*scale_t) )
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = np.where( wsum != 0 )
    x[pos] /= wsum[pos]
    return x


def stretch_stft( X, scale_t ):  # time stretching: stetches the stft via linear interp.
    # X = input STFT
    # scale_t = time-stretching scale factor
    n = len(X)
    n2 = int( round( n *scale_t) ) 
    X2 = np.zeros( (n2, len(X[0]) ), dtype=complex  )

    phase_counter = np.angle(X[0])
    
    for i2 in range(n2-2):                  # i2 counts along new spectrogram
        i = 1.0 * i2/ scale_t               # i is the "fractional index" on the original stft
        ibgn = int( i )
        di = i - ibgn                                               # goes between 0 and 1
        mag = (1.0-di)*np.abs(X[ibgn,:]) + di*np.abs(X[ibgn+1,:])   # linear interpolation 
        X2[i2,:] = mag * np.exp(1.j* phase_counter)                 # resynthesize

        dphi = np.angle(X[ibgn+1,:]) - np.angle(X[ibgn,:])          # phase change per frame
        #dphi = dphi  * scale_t # <--- Adding this makes it worse!  # scale phase diff with time stretch
        phase_counter = phase_counter + dphi                        # compute phase for next frame
    X2[n2-1] = X[n-1]
    return X2



#----------- parse command line arguments and read wav file into memory -----
if len(sys.argv) < 4:
    sys.exit('Usage: %s <wav_filename> <time_scaling> <freq_scaling>' % sys.argv[0])
wav_filename = sys.argv[1]
scale_t = np.float( sys.argv[2] )
scale_f = np.float( sys.argv[3] )

print "reading file ",wav_filename,"..."
samplerate, data = wavfile.read(wav_filename)
if (len(data.shape) > 1):    # take left channel of stereo track
    data = data[:,0]

#convert integer data to floats, and scale by max val
sig = 1.0*data
maxval = np.amax(sig)
print "maxval = ",maxval
sig = sig / maxval
#--------------------- end of preparing input file --------------------


vocoder_type = 'mynew'  # 'librosa', 'mynew', 'mine'

if ('librosa' == vocoder_type):
    print 'Using librosa vocoder'
    sig2 = librosa.effects.time_stretch(sig, 1.0/scale_t) 
    half_steps = 12 * np.log2( scale_f )
    sig2 = librosa.effects.pitch_shift(sig2, samplerate, n_steps=half_steps)

elif ('mynew' == vocoder_type):
    print 'Using my new vocoder '
    sigSTFT = basj_stft( sig )

    print ' Stretching the STFT....'
    stretched_STFT = stretch_stft ( sigSTFT, scale_t )

    print "resynthesizing via stft..."
    sig2 = basj_istft( stretched_STFT );

else:
    print 'Using my vocoder '
    sigSTFT = basj_stft( sig )
    sig2 = my_istft( sigSTFT, scale_t );
 

#save output file
newname = wav_filename.replace(".wav","_out.wav")
print "saving output file ",newname
wavfile.write(newname,samplerate,sig2)

The librosa output sounds the best, and I wish I understood why scaling "dphi" (in my stretch_stft code) by the time-scaling amount actually makes things worse instead of better. ..???

Moving on, we need to resample the "sig2" if we want pitch-shifting. Now, numpy has a resample() routine but in my experience it's notoriously, horrendously slow unless you're careful to make sure the number of samples is a power of 2. Hence we have the code my_resample, which zero-pads the input array as needed:

def my_resample(x,newlen):
  method = 0
  if (0==method):
     # pad signal such that its length is a power of 2 = much faster
     orig_len = len(x)
     p2_len = math.pow(2, math.ceil(math.log(orig_len)/math.log(2)));
     x3 = np.zeros(p2_len)
     x3[0:orig_len-1] = x[0:orig_len-1]
     x2  = signal.resample(x3,newlen*p2_len/orig_len,)
     x2 = x2[0:newlen-1]
  else:
     num = len(x)
     stride = int(num / newlen)
     x2 = np.zeros(newlen)
     i = 0
     for i2 in range(0,newlen):
         i = i2*stride
         x2[i2] = x[i]
  return x2

With that, then we just add in a final call to my_resample, as well as including the pitch scaling in the call to the time-scaling routine, and we're done. Here's the relevant section:

elif ('mynew' == vocoder_type):
    print 'Using my new vocoder '
    sigSTFT = basj_stft( sig )

    print ' Stretching the STFT....'
    stretched_STFT = stretch_stft ( sigSTFT, scale_t * scale_f )

    print "resynthesizing via stft..."
    sig2 = basj_istft( stretched_STFT );

    print "resampling as per frequency scaling"
    newlen = len(sig2) / scale_f
    sig2 = my_resample (sig2, newlen )

If you want to download the full code here's a link.

Enjoy. My day's over. :-)

-Dr H

References:

http://www.tested.com
Vague: http://theproaudiofiles.com/vocal-tuning-pitch-correction/
I like this. Describes steps via pictures & non-math: https://en.m.wikipedia.org/wiki/Audio_time-scale/pitch_modification#Untangling_phase_and_time
Another nice description without calculus, has pictures: http://www.guitarpitchshifter.com/algorithm.html
http://www.scotthawley.com/compressor/
librosa: https://github.com/bmcfee/librosa
MATLAB Phase vocoder: http://www.ee.columbia.edu/ln/rosa/matlab/pvoc/
http://hedges.belmont.edu/~shawley/SHAART/
Basj's STFT/ISTFT Python code: http://stackoverflow.com/a/20409020
Laroche & Dalson phase vocoder article: http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf

Thursday, May 12, 2005

disappear

The blog-train's coming to a halt. The reason is that I'm rejoicing in self-denial (or at least rejoicing in the wanting to do so...) and having a website to invite your comments my thoughts strikes me as counterproductive right now (since I'm trying to get out of the habit of putting myself at the center of attention). For further blog-reading goodness, I'll refer you to these guys, who I recently discovered, and who find to be rather cool.

For those with further curiousity about my motives, I'll attach a recent e-mail to a friend, in small print of course.

Happy trails.
-Scott

What I didn't share tonight, because I needed to go but also because I wasn't sure how to share it, was that the message of self-denial, of choosing God's will over our own (as commented on by ______, and described in the "handout"), seems increasingly to me to be ...dare I say... the most essential part of the whole gospel message! Well....at least for me, and probably for many others nowadays. Previously I've always found this subject to be the most odious, hardest, most..."damning" for me as a magnificently self-centered individual, but I'm starting to see it as a key (or perhaps a door) to discipleship....perhaps as Jesus did! ;-) This week I'd been musing that "Christianity is a religion of the will" (instead of the mind or the heart or the body), and while that's perhaps an easily argued-against statement which I care not to defend, it fits in with what we'd been talking about: Willard emphasizes that we need to understand the origin of the "radical evil" in our souls, namely our trying to get our own way, and he says it's essential that we understand this before we can proceed in spiritual formation.... Thus the first message we bring to people might not be "God loves you and has a wonderful plan for your life" but "The reason you're experiencing such frustration and futility in your life is that you're trying to do what you want rather than what God wants..... repent!" If Jesus said no one could be his disciple unless he deny himself, i.e. if self-denial is really THAT *foundational* to authentic Christian experience and growth, then (a) it is a wonderful thing and should be popularized as a means of grace, and (b) it should be popularized as a normative aspect of Christian growth. (....I can think back such that the only times of spiritual growth in my life have been times when I have SUBMITTED to God in some way.) I've generally regarded the self-denial stuff as a part of a subset of some of the gospel and which I can really get along in life ignoring, as long as I believe the right set of beliefs about God... ;-) I think we don't talk about self-denial much because we see it as a bad thing. But not only is the alternative (our own will) much worse (much more destructive), but it is in submission --- I suppose because I don't practice this, yet --- that we begin to experience the grace-ful transformation God wants to work in us, which is a good thing...

(As an aside: Doesn't the word "Islam" carry with it the idea of submission to God? Clearly then, submission alone is not the "most essential part of the whole [Christian] gospel" because presumably there's something that needs to be said about Jesus! ;-) )

Wednesday, April 13, 2005

celestial mechanics

Have you remembered to practice the Discipline of Eating today? What about the Discipline of Breathing? Perhaps over the past few days you've mused, "I know that I really should get around to `going to the bathroom', but I'm just so busy..." and you even feel a bit about guilty not having done this.

What's that you say? Hmmm. Why is that?

Right. These "bodily functions" give us very strong messages when these "needs" of ours are not met. (Good thing, too!)

Now, what about other "good" practices in our life, things that bring life to us --- prayer, acts of service, worship, reading edifying material, exercise, meditation? Are we not told that "every word that comes from the mouth of God" is at least as important as "bread"?

Right again. When these "higher" needs are neglected, rather than feeling a strong urge to do them, we feel LESS attraction toward them as they go neglected and as we become distracted by other things.

Being a physicist, I'm inclined to think up mathematical models for things that happen, so let me throw out a couple for ya:

Physical needs could be (roughly) modeled via a spring force (a "harmonic oscillator" if you will). As the distance from equilibrium (i.e. the length of time the need has gone unmet) increases, so does the attraction --- the force --- increase toward restoring the body to equilibrium. (You could even have it be a funny spring with a force that increases faster than linearly.)

Spiritual needs or "disciplines" might be described better by a gravitational interaction --- still attractive at all finite distances, but decreasing as you go farther and farther out from the Source of the attraction. Like this.

This gravitational model suggests further analogies with "orbits" of people in life, a behavioral kind of "celestial mechanics". Some people orbit very close to the Source, they're tightly "bound" and their energy is "kinetic" --- active and efficacious. Other people orbit further out, and exhibit more "potential" for real work than for actually making a difference ("Apart from Me you can do nothing..."). Some people have circular orbits, other people may have highly elliptical orbits --- moving very close to the source for a short time, then moving away for long times in a periodic succession. There are even people on parabolic or hyberpolic orbits, who have only one interaction with the Source their whole lives, but they will not bind to it and move off alone, increasingly with nothing to orbit but themselves, toward eternal nothingness.

If we include General Relativity, then each of the orbiting bodies is also losing what I'll call "self-energy", losing it in the form of gravitational waves. It's the interaction with the Source that causes this. This loss of self-energy results in orbits which bind tighter and tighter to the Source, until eventually, the orbit is finished and the body finally becomes one with the Source. (For a black hole source, this results in the body passing through the event horizon, never to be seen in the physical world again.)

....We could keep milking these analogies, but I'll stop there. I guess what I'm wondering is, does having some high-fallutin' model like this really help us be more likely to practice meeting our spiritual needs?

...MMMMmmmmmmm...Nah. BUT, knowing they are true needs like the physical ones, should take some of the "compulsion" and "guilt" out of doing or not doing them, respectively.

Actually sometimes I *do* have to remember to eat. Like....now. Cheers.
-Scott

Saturday, April 09, 2005

go out and play

Om Malik writes about "Internet Anxiety Disorder," something related to Nerd Attention Deficiency Disorder. Malik describes me, and probably you as well, dear reader. So, we'll defer today's blog to someone who said it better than me. Malik's post is here.

I'm going out to play.

Oh no, wait, I have to work today. Well then, you go out and play.
-Scott

Sunday, March 27, 2005

posthuman: what, no bunnies?

This is the first year in my life in which I got jack from my parents for Easter.* My mom ("Me mum" for those Bri'ish readers joining us --- props to me mate Grovesie for the plug, Respect) has always sent me cards, in little packages with at least some form of chocolate. (Presumably the eternally scrumscious taste of chocolate symbolizes the imperishable, glorious body of our risen Lord...) This year, nada.

My point?

My point is that I --- a member of a generation of highly-mobile single adults (which probably includes you, dear reader) who are disconnected from their families and even from the earth itself --- am now without the one slender thread that implied a semblance of family relation at Easter-tide. And so I feel alone-ness.

No doubt tens of thousands of other "twixters" are experiencing at least as much family-disconnection as I am today. So I'm wondering: Is the huge prevelance of this isolation-from-family simply an extension of the kind of separation that historically has come with "urbanization," or is it a hallmark of a new qualitative leap in human social experience, perhaps even one which we're not adequately prepared to face? If the latter, does or will this imply anything
different about how we see ourselves as human beings (e.g. "I and those I know exist alone, apart from any familial context, which we regard as a superfluous luxury --- like chocolate") ...and how we're likely to develop?

-Scott

*Well, maybe. There was a ring of the doorbell a couple days ago, I must confess. But after many hopes dashed over the years, I've concluded that an unexpected doorbell-ring in the middle of the day pretty much always announces the arrival of someone I don't want to meet. So I let this one go... They're supposed to leave a note if they're UPS, and typically they leave the package there even though they're not supposed to.

Wednesday, March 23, 2005

the feeding tube

Remind me: Why don't we just stop supplying food to convicts on death row after their opportunities for appeal have been exhausted, and let them slowly starve to death?

Oh yea, because it's inhumane.

Why is forced starvation, like what was done to prisoners at Abu Ghraib in Iraq, something which should not even be performed on prisoners of war?

Oh yea, because it's considered to be a form of torture.

Why do some people go to jail for keeping on their property a bunch of washed-up greyhounds which are nearly dying due to general negligence and undernourishment?*

Is it because it's just mean?

So, in light of this, shouldn't Terry Schiavo's estranged husband and the federal judge who ixnayed her family's request to spare her life show a little more "compassion" and at least give her a lethal injection or strangle her or something? (Maybe strap plastic wrap over her nose and mouth and "let nature run its course"?)

Oh no, because some simple-minded folk might fail to make the nuanced distinction between such an act of liberation and regular old murder.

*And should I turn myself in to a jail for my failure to use my resources to feed the starving poor people in Africa whom I have knowledge of?

Sunday, March 20, 2005

hervard

Harvard president Larry Summers recently came under fire for his comments during a public address. Summers dared to raise the question of whether the lack of representation of women compared to men in hard sciences might be due to some innate differences between women & men.

For a while, I couldn't find any actual quotes of what Summers said. All I could find in the news were angry quotes from angry feminists about what he said.

Well, I found the transcript of his talk, and it's here.

Reading Summers' address, one finds (and I'll use understatement), very little of the "misogyny," "bigotry" and other demonizing sorts of words attributed to Summers and his remarks. There are a number of things I wanted to try to comment on, but I find that Summers did a better job than I expected of anticipating and heading off the questions I would raise. So, please, just read it for yourself.

What I do find... annoying, and a bit depressing, is how his remarks have been received, and I wonder why people would get so upset over a series of questions that were raised with the goal of better understanding how it is that there are much fewer women than men in the upper eschelons of science. Presumably if you want to change the status quo, it helps to have a right understanding (as opposed to politically correct, agenda-based ideology) of the causes for the status quo...? (Am I off base here?)

The best way I'm able to understand the bad press is that Summers was in fact asking questions, which is a huge offense whenever you're in an environment with a rigorously enforced ideology... These were questions which different groups of people regard as dangerous. Dangerous how? Well, one group of people probably simply misunderstood his remarks entirely, and thought he was speaking normatively rather than descriptively (which he was very careful to deny repeatedly) --- i.e. that we was saying women "should" want to drop out of work to raise families, or some such. Another group may have misunderstood and thought he was primarily addressing innate abilities in science, which he said he wasn't. That would be offensive to...probably most of us. Another group missed the point entirely, thinking that the small number of women who have made great sacrifices for their careers to do "high-powered work" somehow refutes Summer's remarks about what most women seem to be likely to want to do. Truly, to dimish the great efforts of these achieving women would be to do them and all of us a great disservice.

Okay, all of these are sufficient excuses for public uproar (especially if "spun" properly), but I think there's more to it, like so:

He was speaking at the "NBER Conference on Diversifying the Science & Engineering Workforce." People at such a conference are committed to finding ways to change the structure of the workplace to encourage diversity ("diversity" meaning people of various genders, sexual-orientations and ethnicities all espousing the same, approved, secular-humanist thought), and Summers was in a way attacking their whole program. I don't know that he realized this --- He may have just thought he was trying to help people have a proper perspective on the roles of "true descrimination" and other impediments to diversity vs. other factors which may not necessarily be bad or in need of changing (or indeed capable of being changed). ...and of course everybody at the conference knows that women and men are so equal (or no I'm sorry how politically incorrect, I mean to say that women are better than men but nevertheless deserve special dispensations because they're women) that descrimination is the only explanation for lack of representation... But by raising the question that maybe not as many women as men want to work 80 hours a week, etc., he may have been pointing out something that diversification efforts may be powerless to change, i.e. that the whole diversity-terraforming program may be limited in what it can achieve.

Yea, public uproar at that point. If I'm in the diversity-making business, then I'm thinking "this guy speaking is obviously a bigot or mysoginist or...gimme some other word...Nazi, whatever, because he's threating not just my job, but the goal that I'm devoting my life to. Get him out of here, and get him out of Harvard."

Sorry if you're offended by my questions and observations. But don't try to fire me from the blog.
-Scott
P.S.- Okay, I left out one other "dangerous" aspect of Summers' remarks, and it's the whole slippery-slope thing: What's to stop people from using a similar line of reasoning to "explain away" underrepresentation of ethnic groups? He does dare to point out that white males are "very substantially" underrepresented in the National Basketball Association...