Three Principles of Data Science: Predictability, Stability, and Computability

Three Principles of Data Science: Predictability, Stability, and Computability


Okay, so this is the three
principles of data science which kind of it’s a summarizaton of
what I’ve been trying to do as a data scientist, statistician
over the last ten years. So let me share my view
of data science and pretty much the Berkeley
view of data science. We think data science the
re-merging of computational and statistical thinking in
the context of domain problems. It’s very much reflected
in our curriculum. You have these three components,
computer sci, math and stats, and domain knowledge. And data science, some people
say it’s intersection, others say it’s union. But I do think, as a team, you
should have all three covered. And the claiming of the
re-emerging part is because this little story, or big story,
like a statistician, Herman Hollerith, a statistician
in the US census bureau, 1890. It was predicted that US
census would take 13 years and would be too long because
every 10 years, so he was charged to speed
up the computation. As a result he invented
Hollerith tabulation machine, which was the company
was one of the four companies that formed IBM. So that was really a good
example of statistical computation and
now we’re kind of back again. Berkeley got Simons Institute,
you cannot imagine computing without data and
data without computing. These two will just
have to couple. And of course, the domain,
one of the modern founders of statistics, R Fisher was very
much driven by statistical genetics to a certain extent
evolutionary theory of Darwin. And also and also Karl Pearson
was very much tried to give him that magical foundation for
Darwin’s evolution theory. We’re kind of coming back
together this day in the sense that statistics will also
kind of work with science. The interpretability,
mechanistic explanations of things, trying to bring
statistics to science. And most kind of coming through
a very successful period of prediction, also facing the
same problem, interpretability. And you use general data
protection regulation last year, it really gives
the right explanation and demands this algorithm to
be human interpretable. And, cross validation is
really helpful to avoid overfeeding in certain,
most of cases, though not all. But it really doesn’t
address explanation problem. And I have been kind of trying
to bring stability as a unified framework to bring
many things together to address reproducibility and
interpretability. And on one hand, views on the stability of
knowledge is self-evident, our stated principle that
knowledge should be stable. And somebody actually gave
me a quote originally, that you can back to Plato,
in his dialogue, Meno speaking that, opinions
don’t have to be tied down, but knowledge has to be tied down,
it has to be stable. So it’s very common-sense
thing we should have. And actually if you look
at Szemeredi theorem, which is it’s really
a stability reside. If you look at swapping,
swapping trick. That’s not how I learned
Szemeredi theorem. But Terence Tao has a nice
seven different ways of proving Szemeredi theorem. And the more of them a lot
of random matrices results. Actually through preservation
techniques to prove the results. So it’s really fundamental for
too and jackknife, which is delete one and in machine learning people
call risk means stability. And has a lot of result connecting with generalization
error model selection. The lasso, clustering,
calls of inference, and differential privacy. All have this stability kind of,
I will say angle to it, or can be interpreted
through the framework. So I kind of tried
to put together, I call that the Stability
Principle, a minimum requirement for scientific reproducibility
and interpretability. So why apply this
Stability Principle? The user specify what are the
appropriate provisions, so you have to take the contexts
into account to prove it’s a probable. So that’s a very
heavy duty word. These are data or models. And you define your
stability measure. So for prediction, actually,
I use it’s pretty stable. And deep learning the same way. But deep learning you cannot
integrate the parameters, just too unstable,
it’s very redundant. So you define that and
you discuss the stability. An example of data
perturbation in the ID case, cross-validation I mentioned,
bootstrap, subsampling. You can add amount of
data [INAUDIBLE] to data. And you can fit the amount
to take the residual and then do the perturbation. The one form that doesn’t come
from statistic measuring here is, actually, we should really
incorporate also mechanistic model based simulations as
a form of data perturbation. Actually, Pieter Abbeel’s group
at Berkeley, they actually do this the robotic arm
reaching to an object. It’s a lot for
a human to do that and record. So what they did is they added
data which is simulated from PDE models for the arm reaching and
really helped the results. So this [INAUDIBLE] the platform
include the traditional ones, but also, for
us to entertain ones, which hasn’t been dealt with. And model perturbation, so this, the paper Active Stability
was my two key lecture for the Bernoulli Society
like four years ago. So I was trying very very hard
to connect some post stability with two robust statistics. And I realized that we now have
so many different models, right? Robust statistics say I want
a message which work under both models, like,
the Laplacian and Gaussian. And some model is like a leaf
of thick part which is just very vaguely defined say, only second
moment or some smoothness. So it’s kind of stability.>>So, I mean,
in statistics with physics, many interesting things
happen at first and then actually the model
is not very stable.>>Exactly.>>So how do you deal with
that in your context?>>Actually I have kind of a
project with post-doc work, the.>>Yeah,
we have been talking about that. My conjecture here was raised
that actually we don’t have stability there. Yeah, so I actually want to work
with him to prove that that’s why it’s almost impossible to
do the traditional statistics. Yeah, exactly. So, we’re talking,
we’re having meeting, yeah. So exactly,
I agree with you, yeah. So that’s not a regime we
do a lot of statistics. I can imagine that maybe there’s
certain particular projections, something might be doable. I don’t know, but, in general,
it’s the same thing as AR model, when you have the coefficient
at a uni-circle, you already have that. Yeah, it’s a much simpler
model than a statistical physics model. And in simple regression,
do you do rich, right? I was talking to about it. And for me,
I also do factorization for our Genome project. There, we have a non
complex problem. And you have multiple modes.>>For me I think each mode
is a well defined model. Module modes you really
have class of models. I want my whatever answer
to be kind of resistant or robust to which local
minimally you end up with. So I use that idea to have a way
to select number of component on the matrix position
which works very well. And now I show you
that I have 20 models based on three version
deep learning analysis and they all give similar
prediction result. So, what I seek is really about
only when the data result which are supported by all 20 models, just to be on
the conservative side. You have just people do
sensitivity analysis and Bayesian modeling. So there a lot of consideration
I see that’s form of model pertubation. So here what I see in my group
having [INAUDIBLE] is kinda take machine learning as
how I teach this SS2. I teach a statistic course,
I use prediction, I do visualization and
I do prediction as a first kinda benchmark to assess
goodness of it. And then now I have
good prediction models, I kinda go back to more
stability and interest, and Instead of star with t
values and all that. And predictions is very, very
easy to explain to beginners. And I think you don’t need
a lot of the ideaness. You just need some aspect of
things to be stable cross training versus a testing. So just much, and you can
do fuzzy logic prediction. You don’t have to have
Circassian model either. You can have your guts feeling,
that’s what traders do, right? And measure don’t have to
have a generative nature. And competition I won’t address
too much here, but later I’ll say we’ll analyze some grading
algorithm related to stability. So that’s most of the big data
we do one version the other grading lessons. So I’ll share for the rest I’ll talk two
projects from neuroscience. The first is like data
preservation to see the brain better solution parameters
like in the suit. Better than cost validation
by using cost validation and the second is ongoing projects. Try to really do transfer
learning and do in your times. And the guiding principles for my group is really we like to
solve scientific problems. And do really, really careful
validation with a method before we generalize to other problems. And my students embed
it in their real lives, this spans substantial
amount of time. So we tried to play
a bit of a scientist. I mean, said that statistics can
play in everybody’s backyard and I tried to go a bit in
the front yard too.>>[LAUGH]
>>With them, because lots of new data
are collected in a way that didn’t Is technology move so
fast. It’s not like they accumulate
a lot more knowledge about understanding technology so
it’s a good opportunity for us to do it with them. And a lot of
the understanding for data processing from our other
areas actually will help them.>>You can also then have all
your students and post-ops get those faculty positions to make
sure that everyone is doing. Statistics and machine learning, proper language,
different domains.>>I know, but the thing is, we
don’t go for number of papers. Actually about more than
half of my students go into the industry. So that’s, I think the academic metric will
hopefully change a little bit. Where you look at
more substance, and a number of papers. Because I think we can already
evaluate all the papers. So this project is
kinda old from 2011. And collaboration
is J.Gallant and we have been working together
for the last ten years. I took one year off, I stayed
in his lab for a year 206. And now back in Japan and
back to Israel, So these are the main people. Some of you might have seen, it was all over the news
like five years ago. So the data is such that
three human subject were lying down in the fMRI machine,
and the fMRI signals were measured
in different visual areas. Each voxel in the fMRI domain,
there’s actual hundreds and thousands of neurons. So it’s not a direct neuron
measurement, it’s indirect. And the person was
watching clips of movies from YouTube and
also movie trailers. High blood plate I
don’t see my cursor. Your cursor’s here. You can see it by
the 14 by the page. It’s down, I see it there, I don’t see it on my screen,
here you go. So on the left hand side is the
movie, the input to the brain and this is the measurement
after processing. There’s a lot of processing
done, the red means intensive response, blue means not so
interested. So I actually used this in my
latest home data science speech, to show that humans have
very short attention span. Would get bold very quickly. Therefore we need new names, because the really
good with new names. And staticians, we just
keep with statistics right, it has evolved over 100 years. We did so many different things,
only we understand it, others think we just do census. Or based [INAUDIBLE], so
that’s why I was using this to show we need your names
otherwise people get confused. We get bored,
we’re know our patient, I think it’s predictive coding,
right? After we can predict what’s
going on, we just give up and we don’t have to waste the energy
to figure out what’s going on. Okay, so here’s the result. So Jack was all over,
I mean I think. NPR had a coverage,
The Economist had a coverage, everybody had a coverage
on this work. And even the Chinese,
some I read in China, had a coverage on this. I even got a few calls, but
I got Jack to do the work. And my student got interviewed
on Israeli television, and his grandfather
was clearly pleased. So this is what people
was surprised to see. And for
me I want to come back to this. This basically have two steps,
what is forward modeling? Right, the key step is Gabor
features, a 3D Gabor feature, which is years and
years of design work. After that, it’s simple,
some lesser measured. But the feature that
make that work, that was huge amount of work. And then,
that’s the forward model. So for each focus we’re
headed is forward model. And the reconstruction
actually was very simple. So what we did with the new
data, the reconstruction data, if I’m right,
was term replicate so it was a lot higher
signal to noise ratio. So it was just projected, or the one medium movie
clips into the FMR space. And find the 100 closest
to observed, and then we did the average. Okay, so actually,
the posterior, you’re basically treating
the database of one million clips as
empirical prior. And when you get
26,000 dimensions, after all the processing for
the regression, and actually was not enough. That’s why all the blurring so
actually I’m thinking about going back and to do a manifold
average to get sharper result, but there that’s what’s that. So the posterior probability
is exactly what we did actually decrease very,
very slowly. But, magically, the old
ring still has information. So if you take random hundred,
you won’t get a result. So you take the top 100, you get
a result that we don’t really understand, but the numbers
are not all that different. So it’s a mystery we don’t
quite understand, so that’s what it’s like
engineering success. So I want to take
this project and say that I want to do
some neuroscience. Can I answer
the question where in the image a particular
voxel is interested. This is saying that and
this is three voxels. So each actually is related to
a location that’s what you see. So these are the locations of
the removing the frequency and the, and also the orientation. And also the time so I’m only
targeting that location. So you see that CV’s like to
notice that, seem to suggest that this voxel A, is
interesting all over the place. And neuroscience scientifically
doesn’t make a lot of sense because we’re pretty localized. So we develop something
EICV hiding stability across validation which
really shrink it down. It still might not be right,
but it’s a lot more reasonable. So this is more suggestive if
you want to do full lifespan.>>Presumably want, they would want that you put all
the voxel together and you get. Support everywhere, yes? And [CROSSTALK].>>Yes. Yes. So we haven’t looked at each individual [INAUDIBLE].>>Right.>>In theory, we can
actually test this, right? We can just start putting things
here in different features and see whether this
box will light up.>>Right.
>>Well, we haven’t gotten.>>[INAUDIBLE]
first one almost but not quite. Looks a little bit
like a [INAUDIBLE]. [INAUDIBLE] fourier
transform.>>I mean there are a lot of
correlations, right, between the features, right? That’s what got picked up, yeah, and if you look at
the prediction with ESCV model, actually we don’t lose
much prediction at all. That was a surprise, I was
waiting for a concise model And lose some prediction accuracy
for this data, because so much correlation actually, we lost like 1.3% over
the thousands of voxels. But the size of
the model reduced by 60%. So what we do is exactly, I’ll
call it the PS workflow, we use prediction The cross validation
selection is the lambda. So, let’s just take
each lambda and look at another
stability measure. Look at all the projected
coefficients to the predictors and look at how much they
vary from the grand average, of say,
you do the ten fold validation. And it compares the size, and this is really
actually a test statistic. You basically look It’s being
left out of Hn, cuz that’s->>n is how many times you did a cross validation,
right? Because everybody has to do
cross validation usually, for a sinusoid, not adding
computational complexity, which is building
on top of that.>>No, no, but beta n means that in your
>>Beta is the>>[INAUDIBLE]>>Yeah.>>[INAUDIBLE]
>>Yes, yes. [INAUDIBLE]
Location that I do not have. And this is you can testing
whether X beta being zero.>>Is a test statistic. So what we do is that we use,
as I said, I use the cross validation
selection as a goodness of fit, and then I shrink
the model to a smaller one, until this stability
is the best, locally. And that’s the choice. Okay, so because if I have beta
as constant, it’s very stable, had nothing to do with my data. So this cross validation
choice force me to Fit the real data, and it works. So that’s where we have this
I’ve been using this for other projects, which tend to
not always give you the same predictions, really? Because by definition,
it’s a smaller model, so it always gives you
a more concise model. It’s just easier to look at. So we tried it We see more
non linear running for a space model. There, we can find stable
high level interactions that you lose,
there is something just noisy. You can do prediction,
you can [INAUDIBLE]. So there you lose prediction
performance by like 10% or something. Here we barely lost anything. So this is like
data preservation. Try to go for stability. And the next project’s also
going for predictive and stable association for
interoperability, and then hopefully we can guide
intervention experiments for neuroscience. So this is again, project is
led by my group, three people. and when Jack’s lab and collected data ten years
ago it took two years. So this is different data. Single neuron data
was behaving macaque. Mike is a current student,
the reason is these students, that student and that
student now work for Google. [INAUDIBLE]
>>No, this is signal neuron data,
and not human either. So for the fMRI,
we only use V1 and V2. So we have two
major pathways for information and
visual processing. One is what, one is where. So this is the what pathway. We are trying to decide
what we are looking at. And we want v2 to use for
reconstruction, because we don’t have a good
forward model for v4. So v4 is a very
difficult area before IT, which is object recognition. It’s more sophisticated
than face detection, but it’s not sophisticated
enough to have the whole object. So people have done very
controlled stimuli, like with curvature, texture. So very. Conjectures. We’re going to use. On the other hand, you have all
of these deep learning stuff. And we didn’t start
with deep learning. Usually, they are trying
to stay in cold areas. But I’ve been joined
to the hot area. I’ll tell you why. So AI And I’m in the camp
that I think humans, at least the good humans, can still
do a lot more than current CNNs. And, I think in my lifetime, I’m safe that the machines
won’t take over. I feel very happy about that. And I think the challenge for AI is actually how
do you reproduce. Intuition and
conscience pretty much defined. Actually, human intelligence
is not well defined either. So that is circular argument,
but I do think the future will be
a human-machine collaboration, that we already
kind of doing it. So for this project,
we tried to answer, actually we what
do V4 neurons do? And now we end up having to
touch upon this question, how much do convolution networks
resemble brain function? And actually, to my surprise, I
have to say that we do have some evidence, it does
capture something. So what happened was that we
had this project since 2013, with To analyze that data. And we build our own two layer
network, use sparse coding. Very simple but
worked state of the art for that data set
interim prediction. What do I mean by state of
the art is that we usually want as a benchmark. How well we can do
prediction when we want. And we compare that’s how we
compare why it’s a good result. And the colors group,
as I mentioned, actually has a parallel line. And they did a lot of the
prediction work and published. And we had this work presented
previous to being written. Because Jack hasn’t found time. [LAUGH] And when I protested
that it has been four years, and he said that
papers took six years. So the price to pay to work with
a top computer science lab. I gave a talk at the [INAUDIBLE]
meeting two years ago, and Joshua Banjee
was talking And I thought I was
making my own nets. So the question is if we just
use the nets trained by image net what would happen? So that’s why I got
into deep learning. To my surprise I had my two
best people on this project, so I know the result would
be very hard to improve. But the deep nets
improved a tiny bit, so that’s how I jumped in. The colors group had a little
bit different image. They don’t have
complete natural image. They have just semi natural
images with object imposed on top of natural image. And they in some sense
were pushing so hard interpretations because they
kind of had predicted result. But this is good we
replicate their prediction. Actually, it’s not a bad thing
that we did it independently. But we now Pushing very
hard on the interpretation. So the data was 71 neurons
from [INAUDIBLE] and we didn’t find any difference so
we put them together. His eyes were isolated
visual neurons. They would use this typical
complex stimuli to find where other [INAUDIBLE] neurons,
of course they can be wrong but that’s what they call
[INAUDIBLE] neurons. And black and white images. And we did transfer learning
in three senses I’ll explain, and gained state of the art
prediction for our neurons. And then we have been spending
two years trying to get stable interpretations of
our predictive models, so that we think we can
characterize neurons. So this is image net. Alex net and
[INAUDIBLE] down color images, and for
[INAUDIBLE] human labels. So it’s transfer learning in
the sense that we have black and white images, and
have nothing to do with the, we just use this,
after people feed the image net, we take it as processing. Box. We’ve got the features, and we do a very simple
progression of .So this task, very high level visual task,
this is a single neuron. And this is human data,
this is a. This is color,
this is black and white. We did didn’t do anything. We took it and just changed
the black and white image, just three channels, put the same
stimuli, made it color, right? So we didn’t do any alteration. And we did on the suit and
we got very good results, slightly better than
self-made model. So this is already a human
machine collaboration. Without the labels, image,
I like don’t work so well. So a lot of I would say crowd
source human intelligence wanting to image like in
the image and labeling. Now we have other nets right, we
have VGG, we have Googlenet and we have and
these two regression methods. So we suddenly have six models,
right? So this is our
model perturbation. And we want to see how they
work into a prediction and visualization. And we’re gonna interpret only
the part, that six later, we got 20 models, we look at the higher
layers give similar results. So this is our raw
prediction performance. So you [INAUDIBLE] can see
definitely doesn’t work for V4. And after layer two, you have many good layers
give similar performance. So I’m going to concentrate on
layer two and then I’ll go for other layers mentioned. So this is Lasso. With three different nets they
basically give similar results. The correlation across
the different modules is 0.92. And, this is the data, so
it’s indistinguishable. And on average, Ridge gives
the best prediction performance by pretty big 0.4 newton
number of parameters and the Lasso shrinked to about 400. One thing is for high
[INAUDIBLE] you usually correct the [INAUDIBLE] or else you
remove bias but for this data, there is still noise
to be removed. You do ridge,
this become like elastic net. So this is pretty noisy data. So OLS actually hurts. Usually, for higher signal
ratio, it correct for the bias, but for this data,
you want more regularization. Doesn’t give me enough
regularization. So now we do the interpretation. So this particular neuron,
we want to see that this basically is
a radar from the models, to indicate that where does
this neuron care about? It’s got a new region. And you can see for
Lasso and Ridge, even Lasso it’s quite
a bit worse [INAUDIBLE]. So we’re happy to see that, this is probably where
this neuron care about. This is suggestive,
which is hard to interpret that this neuron cares for
the better ones. And then cross through
different nets, they also give very
stable results. So now we just say that
that’s something we trust. And we end up using our Manifold
Deep Dream to characterize the neurons. So what’s Deep Dream? Let’s say you have a dimensional
function with a high dimensional input space. And you take a random
starting point and then you’ll just find
the local minimum. And that’s the image
that has shown. So there’s a lot of consistency
with the random starting point. I’ll show you a movie. And this is
the inhibitory patterns. This is where the neuron
gets negative results. And here you can see. That’s the iteration from white
noise converge with something. So it’s pretty stable. And this is a different you’re
looking at from my group, Rebecca’s work,
we called call super heat. It’s a way to look at high dimensional data
with a response. So I’m also advertising for
that, called data visualization. And this is taking the images in the database
we have, black and white. Find one that excites
this model neuron. Saying that and
you can see there is, this is kind of
texture I see regular, we can not really
see what it is. This is another neuron which
our favorite one of our favorite neurons. Again, Lasso doesn’t quite,
because, a neuron doesn’t really jump. So, we’re seeing, actually, Ridge is biologically
more meaningful. The region for neuron to care about
usually is very continuous. And Lasso has a problem
sometimes sparsity and jump around. So, Ridge actually is better. And this is a neuron. Again, it’s pretty consistent, it’s in the middle and
this is the pattern. This neuron cares
about curvature. This is the Inhibit neurons. And this is the movie,
you can see similarly. For paper, we cannot do that, so
we have the super heat to show that convergence
is pretty stable. So you see that
nothing happens here. This is because this is
the color filters and we have black and white images
so nothing happens in that.>>Since it’s so nice, you would imagine it in
some very simplified way. There should be some kind of
proof of convergence, right? I mean obviously not with this.>>Yeah, this multiple-
>>But there should be a really, really simple version of this,
right?.>>No, that’s what the hope,
I think, for computation, the high dimension
that we definitely hope everyone is trying to get
a simpler version. I’ll show you some compression
result try to go that way.>>And this is layer three
you’ve seen already. As you can see, that GoogleNet might actually
have this funny features. This is use layer three
as processing layer four. They’re not exactly the same but
the consistent, again, the consistency is
this curvature.>>GoogleNet definitely
has something funny there. This artificial like
eye looking things. I think you have seen other deep
stream images like that from GoogleNet too, but that’s
why we look for consistency. We only say that
the things are consistent, we’re not gonna interpret
this eye, I think, because we see other models and
predictions. They might be there, but we just
don’t have evidence to say. For this neuron,
I should have this first. For the last neuron, you can
see that the image is picked up maximizing the model neuron
has a clear signature or curvature, much more
clearer than the other one. So this is really
a curvature neuron. And there are other
eight neurons. You saw this one,
you saw this one. This one is basically
like V4 neurons detector. It’s known as some of
the V4 neurons like V1. Because, from the design point
of view, it’s not a good idea to have same functionality
neuron in the same place. If that part of
[INAUDIBLE] knocked out, you’re pretty hopeless, but
it’s good to have redundancy. And also, it’s possible that
neurons might have plasticity, have multiple roles, too. So you can see that they’re
all kind of different but I think are characteristic. And to also compare, if you take this, we have this
characterization neurons. And so
I’ll give you nine images. It’s possible that
a neuron like all of them. Because there’s no reason why
one neuron only likes one exact image. Maybe a group of distribution
neurons excited, right? So now we have
[INAUDIBLE] neuron. Each have some characterization
of the favorite pendant and you do the cross, you can see that
he really prefer his own model. You see the diagonal. But this group is
like similar neuron. They like similar things. And this group is like
some saturated images become the inhibitor for
another group. So it’s all present. To follow up what Jennifer was
saying that we also try to simplify things, to understand. So one project by Riza
is actually compression. So we use the reduction
accuracy, this is back to the one sounding category
image nat, to remove filters. So we are not just removing
waste we are moving filters and that’s more effective and deep learning to work
out small device. And which one so
with compression mold onto 90% you can see the pattern
kinda still persists. Okay, so
there is something to that so we kinda move lots of them,
there is something which just show another way that
there is something real there. It’s not exact, but you see
a certain feature occurring. We lose very little
prediction accuracy. If you look more carefully
into the comparison result, you see that a lot of
the filters actually, we remove all the color filters. For a lot of object detection,
we don’t need the color, right? It’s the shape. So you can see that makes sense. And we have another way, we
submitting something to workshop and it got accepted,
was that we also use the classification label to
good features meanings, right? So if this feature is
very important for a classification for
a particular label, that it’s almost saying this is
this feature’s functionality. So we both use visualization
in the input and also in the output to
characterize the features. And we really hope
to do experiments. So I see all this work,
predictability and stability. It’s kind of a scientific
recommendation system. We want to recommend things for
people to do experiments. So now we’re claiming that these
[INAUDIBLE] images are images characterizing neuron. We can actually test it by
feeding it back, actually that’s a problem people don’t know
how to probe a neuron, they do synthetic ones and it’s not very
natural but what else do you do? This gives you a data driven
way of generating images to put it back into the loop and
prove it. But Jack is out of his
allergy experiment, so I need to find another
collaborator to do that. With the computing,
with the files, you can do that real time. So that would be really proving
this and the other things, we can go back, to this, so
this one’s about V4, right? So we can go back to the movie
project and then really improve the reconstruction use the V4
area of fMRI, which we didn’t use, because we didn’t have
a good forward V4 model. So these are the two testable
experiments we could do and the last bit is computability,
right? So you have Turing machine,
you have Turing computability. I’m gonna take a very
practical view. Computability means reaching
convergence in [LAUGH] [INAUDIBLE] optimization method. Otherwise I cannot relate
the two, they’re not computable. I tried but
I couldn’t make it work. So we’re looking at algorithmic
stability, which is delete one and there have been quite a bit
of work relating these two, and also this. So this beginnings of work,
we will add to this link. How to relate convergence rate
to algorithmic stability. We basically have some
inequality to have a success rate
bounded by stability. I wrote class optimization and
using stability found and estimation found we can actually
show that a very stable algorithm cannot
cover us too far. Which is completely reasonable,
but you actually prove it in more
precise, and we analyze some special greeting algorithms and
and the paper is being written. So summary, I hope I give
some evidence and establish predictability and go to
stability, as a good workflow. To really do use machine
learning and and then bring back other
statistical considerations and we also try it with genomics and
text mining project. The stability principle is so, you can say vague that
there is a lot of room for people to improvise to
apply to their problems. It doesn’t really dictate
what protobation you do but just force you think
what’s appropriate and that is actually a huge amount
of work to really make a case. And we can end up with
a testable characterization neurons for
follow up experiments and, there are a lot of theory
I think can be done. And, I do feel like CNN
trend an image there. Is to capture something about
prime video processing and I talked to some students in. They have been trying
to use image net and then follow radiology task. So they start with image net and
then they do improvements for the particular task, say tumor
detection, but, it’s kinda, you have the rough vision
model from this image net and then you tune for
a particular task. Zoocyte, you have average human
model as an image net, and then you tune it as a radiologist,
anyway and it’s pretty amazing for visual tasks, the
ability to capture something. I was surprised, I’m usually the
last one to jump in other areas but with evidence, I did. So here’s what I said,
at Berkeley, we have been working on this for
the past three years. So now we want the intellectual
vision which is the integration of CS instead for the designs
to really have a structure to support it, and the report,
105 pages is on my website. And the data aid now has
entered the second year serving 1,000 student
a semester, so 2,000 students. And then I was on
the team with two and the staff faculty to design
DS100 last semester, and now it’s 250 and
David is the interim dean. And we have a new data
science major election in science being approved there’s
gonna be discussion for engineer also have
a version of data science. So it’s really quite exciting
and a lot of us there and these are the links also
sign in the NSF center for technology that support
the neuroscience work. Thank you.>>[APPLAUSE]
>>Questions? I managed to finish very
quickly, yes, Andrew?>>Your aspect of stability
that you focus on seems very closely related to this
notion of regularization in some of the early work done
on radial basis functions for modeling human vision. Like the work of Tommy Paul Joe
and [INAUDIBLE], is that the same idea, or
is there something else to it?>>Well, I think I’ve tried
to build a platform to unify, definitely, but I think the earliest reference
of our stability actually is Ray’s regression which is
[INAUDIBLE] paper in 1943. He was solving
an integral equation, and he linearized it and then, for numerical reasons, he did reach
the [INAUDIBLE] panelization. So that’s a regularization. So, lot of times, I feel like
the Asian models, I think Persia has been doing, is almost
a form of regularization. But the thing harder hard for
the Asian models, I have a hard time when you
have more than two layers to know how the regularization
from different layer interact. So it’s a little
black box there too.>>But is the basic idea
that you basically want to penalize some kind of-
>>I think it’s one way doing that but
I want to be broader than that. So for example, including the
mechanistic models for the arm. Right, you can see it’s
regularization in ways like you simulate this data from the
model, you’re putting the data toward data that’s not in
an analytical form anymore. It’s more like putting to
another cloud of data. Better than not having it,
right? It’s not exactly right, but
putting it in that direction helps you but
it’s not really analytical. So I hope to be broader than
just analytical regulization. And like data fusion,
I was saying that you can also say it’s kind of a form
of data preservation, right? So try to be very broad, and then we have a common kind
of language to talk about these different things and
maybe compare other questions. How much you have another
data compared with that in analytical method. Can you map them? Yeah. It just seems very common sense,
right? That’s why I talk to
this philosopher science colleague recently, a man. Said somebody must have
talked about it, right, in philosophy or science. And he said yeah, Plato did. I said good,
that’s a good person to cite. And yes?>>I’d like to ask
a technical question->>Yeah.>>About the neuro stuff that
you were doing with Jack a lot. Am I right that what
you’re doing is basically reconstructing? By looking at it
neuron by neuron or voxel by voxel, you’re actually
reconstructing the image that is actually perceived
in the brain, right?>>You hope. Yeah, that’s the neuron-
>>The replication is showing, right? Your reconstruction
is showing that.>>I’m trying to say that I did all of these different things
into something stable, right? But the ultimate test
is to feed it back to see the neuron get excited. Right, all my images.>>But aren’t the images
that you reconstructed->>So that’s the first project?>>Right, the first project.>>Yeah.>>You reconstructed that from
looking at the voxels that were stimulated->>Right.>>Using your fMRI, right?>>Yeah.>>It’s the bold signals.>>Yeah.>>You’re looking at
those signals and then you’re reconstructing
the image based upon that?>>Yeah, that’s why Time
Magazine called mind reading.>>Right.>>But it’s not really yet. But people got quite scared.>>Why isn’t it mind reading?>>I just felt like that there
are so many other things, attention, we didn’t
take into account.>>Well, let me give you
an example of where I think this could be mind reading. Suppose that you were doing
an fMRI of a person that’s sleeping and dreaming.>>Yeah.>>Would your method actually
be able to reconstruct images of the dream?>>So
people had follow up studies. Actually they wake people up and
ask them what did you dream?>>[LAUGH]
>>So they did do that, but it’s how much you remember,
right? Till you never know exactly,
but you can get quite close. But I think the precision
has to be improved.>>That looked pretty
precise to me.>>You’d think so, but for human figures because there
are a lot of YouTube clips. People like to talk
to the camera, so that’s why we do the best. So if we only concentrate on
human talking to a camera, I think we can do a lot better. But, yeah.>>For example, you’re gonna [INAUDIBLE] in
between a person and building.>>Yeah, well and I also like a
blob of animal, something, yeah. Yeah.>>How distinguished does it
you dream of your spouse or somebody else?>>No, that’s true.
>>[LAUGH]>>But let me->>Future detector. [LAUGH]
>>What if we read about a paper where you’re actually be able
to demonstrate that you can use this method to detect whether or
not somebody was dreaming about a car, a building, blobs
moving in a particular way? Even that would
be extraordinary. Don’t you think?>>Yes, but I think we’re not
there yet, the precision. I think definitely it
cause a lot of interest. I think the people but
I don’t know. I mean, on one hand, I’m very
curious to see go there. On the other hand,
I’m kind of scared to go there. So I start working,
I will be following you around.>>[LAUGH]
>>And Jack’s group moved down to a text, audio and
image and more like. So nobody seemed to want to
really kill that application. I think the implications
can be quite scary.>>Yes.>>Yeah, so we chose. But I kind of want to
improve the precision. But that’s why people are so
surprised you can go that far. As I told you,
it’s pretty simple. Of course,
a lot of things after you do it, it’s pretty simple, [LAUGH]. But ahead of time was
quite surprising but we saw the one minute movie
clips with that, impossible. It’s kind of the prior
visual experience stored in that one minute. But in 26,000 dimensions,
one minute just disappears. So if I have a lot, we need a lot more to really,
we kinda do the moment. Because in 26,000 dimensions,
nobody is close to anybody else. You’re just very far away. So you do this very average
which is not visually appealing. So I think I started looking
at the problem again because I think there might be better
way to combine things.>>Well if you take
a look at the campus and look at the layer structure,
this went five layers there. And so you have a tremendous
capacity to store a library of these movies that
you’re drawing from.>>Yeah, and a lot of times,
actually subconscious. So we’re kind of like a camera
taking everything in and I’m always surprised. Sometime, I see something and
something came back from my childhood I felt I had never
thought about for decades and just come back.>>So if you have that image and
I look at an image, and then what you see, the NF of I, is it clear that my brain
pattern looked exactly the same if I now project an image for
my memory back into my? Because one would sort of
come from the cortex and sort of label one to
three of your neuron. Then if you want to know why I’m
asking when I sort of imagine an image, will that
actually look the same? Do you have any evidence
that’s the case?>>I think people probably
have done study on that. I don’t know on top of my head,
but I think people probably
studied that question. Like what’s from
your memory recall? I don’t know.>>You could do that very
easily in your setup right now, put a subject in the MRI and then ask them to recall
image of a bear. And then do your voxel analysis,
and see whether or not you can reconstruct that
image from their visual cortex.>>Yeah, but bear is very weak. You can try that, but
if you don’t find anything, it doesn’t mean that they
couldn’t do it, right? Because bear,
your past experience come in.>>You could,
much easier, right? You could sort of say,
I have ten images. One is to wrap bears into divide
and sort of one in the car. And when you ask them, do you
ask them to look at the image? And then you put
the images away and say, please recall the image onto
>>Yeah, that’s more precise. Yeah, well, we need money
to use the fMRI machine.>>[LAUGH]
>>Usually, it’s not in a statistic budget,
yeah. But yeah, I think I would ask
them, cuz somebody might have. I’d be very surprised
people haven’t done that. Yes?>>Is there an assumption
that for each person? So in our conscious brain, when
you think about an image in our mind and we imagine there’s only
one image in our mind at a time. But when you are thinking
about something and you get sidetracked, and you start thinking
about something else, there’s a moment when you
have two things in your head. Actually, the neurons that
have fired when you were thinking about whatever you were
previously thinking about and then some new neurons
are firing, and that’s a synaptic response. So this response is receding as
the other one is raising, so there is some sort of
mixture model happening. I’m just wondering if you’ve
considered this at all because perhaps something you could be
picking up that has noise is concurrent thought. I mean people say that our
auditory complex can only think of one language at a time. So somebody can’t be speaking
to you, and someone else, and you write something. Some people can do this, but
there’s a lot of work that shows that you only sort of
have one processor. But I don’t know for sort of
subconscious thought or just in our own mind when we think
about experiences or images. And we’re trying to reflect that
in an image from the brain if perhaps you can be picking up
two separate images firing. And especially as we’re
changing thought or thinking about new things. I don’t-
>>Definitely it’s a possibility. We haven’t dealt with that. But I think the bigger problem
is actually a tension. Sometimes you stare
at the screen, actually you relate to
what you’re saying. You think about something else. And there’s some
attention tracker, which we don’t have
an experiment. So I think that’s
a bigger problem and your mind went off, right? You’re looking at it but
you’re really not seeing it. And before,
the tension definitely played a role already. But I think what you
were saying though, I felt like you need a lot more
precise measurement than we have to be able to decouple.>>Yeah.>>Yeah, I think, I don’t know,
people talking about 17, but since hasn’t been really because
you have this trade-off. I haven’t heard people
using 17 machines, right? It’s just seems to be
hard to make it work. It’s very sensitive. So I think with more precise, or
maybe multi-mode measurements or something, but
I think the data we have, I think would be very hard
to be able to see a fact. That’s just my guess. But a lot of times, maybe the
model doesn’t work is because this person just spaced out,
right? They have been there for
two hours, right? Even they’re fixated. They’re a few graduates. Like who’s gonna fix it for
that long, right?>>Yeah.>>Maybe just they got tired,
right? Actually we kinda maybe see the
[INAUDIBLE], things got worse or something. When we have better experiment
data I think in theory, we could do that but
for this data, I think it would be
very hard to find. Interesting idea.>>Okay, so
let’s thank Bin again.>>Thank you.>>[APPLAUSE]

Leave a Response

Your email address will not be published. Required fields are marked *