what is learning ?

I’ve said a while back that learning is changing a variable and I believe that it holds true in the most general abstract sense. Something has to change when something is learned. This definition doesn’t help much though, these are some questions that need answers to be more useful:

  1. the variable that has to change, let’s call it x, has to have an initial value. What should that value be ? Why ? What does it represent ? – — > In my system the abstract value x has an initial value that would allow a single synapse to fire a postsynaptic neuron and is constrained by some arbitrary values for Activation Potential and the initial “glutamate” concentration in the postsynaptic axon.
  2. When to change x? this is not clear at all, the input value that would alter x, in this case firing frequency of the presynaptic neuron, is variable. That input value is rarely “accommodating” the x variable, but how much of a deviation from x should be allowed, before changing x ? This is the case where the decision is made locally and rely only on the deviation value from x. Relaying only on the local environment to change x does not seem to work. Changing temporarily x to alert another “decision node” seems more useful, less random. Yet this is just deferring the decision of making a decision.. Another “decision node’, will just be the same problem where instead of x, we have y. Deferring a decision should end somewhere, to some other z variable, that is in fact a constant. Any push to change that should result in a feedback loop that would promote action to change the input..

I see some benefits in having multiple decision nodes that make changes in short feedback loops and have a final decision node that would promote action, yet this is still not enough and not clear. Another way of framing question 2 would be: “What is feedback ?” – while the same question, the answer does not seem to be the same.. Feedback to what ? When you say: “This is not a cat”, this should alter multiple decision nodes or just one, because why that is not a cat, can be many things… Then what to change ?

3 How much to change x ? Looking for a minimum ? Does not seem feasible because it take a long time to accomplish anything. Adding a single AMPA receptor on the postsynaptic side of a synapse, has a big effect on the final output is not just adding a single Calcium ion in the mix.. I have no clear idea about this. When I change x in my system that is an exponential decay but I have not found a dx that would have a link to the rest of the system.

4 When to stop changing x? Just because I detect an increased frequency, increased from the expected value, that is, does not mean that I should change x indefinitely, yet how do I know the change I made is enough ? Still based on expected value of the next decision node ? That node would change x so the expected value in the decision node is met while asking further away nodes if it should also change the expected value ?

Since my last post I haven’t done any work on the code side of the project. I initially had a theory of how this should work, but that proved to be too simplistic. Then I programmed everything I could think of based on what is known in biology hoping that by doing so, something would eventually start making sense, But that was not the case either. Nothing has become clear. Now I’m again trying to understand the basics of the problem and change the code to fit a certain theory 🙂

what is the role of LTP/LTD ?

Still unknown :(. I have assigned some roles based on what I read in literature, but it is still unclear what role it plays in information processing. I have two opposite views..

  1. Could be that through LTP/LTD, the range of “states” for a synapse can be enlarged and would not play a role in learning, at least not directly. That would imply that both processes should be fast, easy to implement, flexible and equivalent in terms of the default (or most likely) state.
  2. They play a role in learning, as such, the state of a synapse should be difficult to change through LTP/LTD and would have a meaning (would represent an internal reference frame).

Since I have not decided which one is which, I implemented the algorithms to behave something in between, but that does not seem to accomplish much… In short, I have made no real progress..

Long Term Potentiation/Depression

Any type of modification for synaptic “strength” leads to modification of the firing rates, so much so that a specific pattern cannot be correlated any longer with a firing rate. That means “learning” has to be limited to perfect timing and the formation or breaking of synapses. I’m not sure what to make of it…

I have assigned 2 roles for LTP/D:

  1. Both will contribute to synchronization of firing events by modifying the frequency of firing – this requires a firing event to take place.
  2. Both will adjust the synapse to high/low frequency leading to breaking of a synapse in both cases, very high or very low frequencies. This does not require a firing event.

I have run simulations where 4 synapses linked to a single neuron fire at different frequencies and there a two scenarios:

  1. The neuron will remove synapses with lower frequencies and the synapse with the highest frequency will remain, but there are cases where that synapse is also removed because it cannot activate the neuron by itself any longer.
  2. The neuron will go through LTP/D events that are canceling out, synapses are not removed but the neuron will activate with variable firing rates. As far as I can tell at this time, a neuron firing at variable firing rates cannot form synapses with the next post-synaptic neuron, so I’m inclined to get rid of this option…

There should be a third option where close enough firing rates of the 4 synapses can by accepted by the neuron resulting in a single firing rate because they should be within the adjusting power of LTP/D, but I have yet to find in practice such a scenario..

Still many issues to investigate and solve… finding a good implementation for LTP/D is going to take much longer than I anticipated..

Long Term Potentiation/Depression

Both continue to elude me… So far I have 2 preliminary conclusions or maybe just hypothesis…

  1. Both LTP/D serve to bring neurons to firing synchronization, which I actually understand as a way to determine which inputs are part of a pattern and which are not. As long as neurons fire within a certain time frame, that time frame is minimized through LTP and that minimization, dt, is the uncertainty in recognizing a pattern. LTD also is trying to bring the firing timing within that dt frame or to remove that association entirely.
  2. Both LTD/P activate when dealing with high/low frequency. Low frequency should mean not urgent, don’t propagate fast or through deep layers. High frequency means this is important, go fast and deep, reach a decision node and see if action is necessary. I also observed that high (relative) frequency overflows the neuron with too much potential the net result is that 2 different patterns with high, above threshold energy, cannot be distinguished requiring in fact an LTD event. When I deal with low frequency (presyinaptic activation that does not end up activating the presynaptic) then I delete that synapse, in multiple steps, though..

To me, the second point is difficult to reconcile, why would they be clustered to the same mechanism ? Are they the same mechanisms or appear to be the same but act on different chemicals / mechanisms…

I found a group in Israel, professor Ido Kanter‘s group. Their work seems to be very useful for my project, I could not find that kind of information anywhere else. Very much appreciated. Thank you guys.

still no progress :(

Something is missing still, but I can’t put my finger on it… There is no clear way to decide when a pattern is learned… In the absence of a clear criterion, that would stop the learning and preserve the variables, that patterns is getting broken down into smaller parts until there is a single pre-synaptic for a single post-synaptic neuron… and even that last link breaks, resulting in no connection in between layers… I believe I’ve mentioned this problem more then a year ago and now I’ve gotten full circle back to the same problem… The equations for synapses are different, the conditions for initiation or breaking connections are different… yet this problem remains… I’ve looked back on my notebooks and I found no notes commenting on this problem. It seems I was not sure this was a real problem… No solution seems obvious at this point. Stopping the learning process at an arbitrary time has many drawbacks, but I see no other way of going forward…

make it or break it – take 2

Yup, that time has come. No more wiggle room. This should be the first step in what I want to build … a new breed of AI. This first step should show how learning is accomplished in such a system. Signals should be classified and recognized as the same when seen again. The first show case I want to build is 2 color recognition on a 3 layer network: L1 2×2, L2 5×5 and L3 2×2 with a single inhibitory neuron on layers 2 and 3. I’m also considering showing angle recognition on a 3×3, 3×3, 1 network configuration.

I did not add color vision but I added a simpler simulation where different “pigments” (as in cone cell pigments) would result in different firing rates / neuron.

This could take days or months or maybe it won’t work at all.

make it or break it time

I’m nearly there, about two more weeks to get there. I worked a lot, but mostly doing simulations. The problems more or less remained the same:

  • inter-layer transmission of signal. Because activation frequency decreases with each layer, in layer 3 I already get to the point where the frequency is too low to activate neurons and synapses are classified as low frequency and removed. Increasing some sort of synaptic strength did not work because that increase is limited, it can’t be something arbitrary. Same layer connections should work as an amplifier, but they suffer from bad timing. There are mechanism that should synchronize the firing events but they too fail more often than not. I still have hopes to improve on this though. Another mechanism could be a persistent reinforcing feedback … Every 3 layers or so the third layer should feedback on the first layer, increasing the frequency for both, it seems far fetched but I’m close to implementing this option too.
  • Inhibition – I’m still looking for the proper level of inhibition. How many synapses should act together to overcome inhibition (meaning in spite of inhibition being present and limiting synaptic potential, the neuron should still fire by adding small but many synaptic signals). For now I believe that this should be the desired mechanism, rather than “all or nothing” mechanism I had so far. The all or nothing is still present for fewer activating synapses. But the number of synapses needed to cooperate is still unclear .. it has to be related to some minimum frequency..
  • Synaptic frequency – a single synapse seems to very limited in the values it can take.. In theory I could have around 100 values, but in practical terms that number is even lower .. around 20 values.. Messing with this number is tricky … There is a hard theoretical limit for the highest and lowest values, but if I increase the value, I also increase the computation time and limit the inter-layer transmission. I believe I can find a reasonable compromise, but right now I feel severely limited by this problem..
  • Synaptic connections – has now become the most pressing issue and the most complicated.. Making and breaking synapses on large matrices is slow an requires me to keep track of every connection. A single neuron can connect to a lower layer same layer and upper layer + to a Inhibitory layer .. at the same time. So far I used a simplification, once removed a synapse would not form again. But that was putting serious pressure on the mechanism for timing the connections, meaning a connection should be the best connection possible from the first time. That is not working right. I tried to wait long enough for all connections to form in the first layer and only after that bind to the second layer.. There are too many cases to consider. So now I let connections form and break forever. But keeping track of all of them is difficult. I still wonder how two neurons close by, stop forming an infinite amount of connections among them. They form many, to be sure but not a fixed or an infinite number.. So when does the process stop.. why ? If I don’t keep track of every synapse, two neurons would keep forming synapses among themselves..
  • Training patterns – I thought they could be everything.. I’m not so sure anymore. Signals received from the retina are heterogeneous because there are 3 types of pigments for the cone cells and 2 types of bipolar cells.. This results in a complex pattern even when you deal with a very uniform input pattern (say watching a white piece of paper ). Having a heterogeneous signal works while a homogeneous signal leads to “bad timing” .. close by neurons receiving exactly the same signal cannot form connections among themselves. I have a work around for this but I don’t like it because what I see as input is not the real input (the white sheet of paper, say), I need to imagine the input… that makes understanding very difficult.
  • Small input going into a bigger second layer.. I’ve procrastinated on this. I can create layers of different dimensions but neurons don’t bind with some sort of step. I need to have neurons that don’t get direct connections from the lower layer, or even if they still bind I need to have more neurons in the second layer to act as signal amplifier ..

Inhibition

I implemented more sophisticated mechanisms for inhibition… Now an inhibitory neuron is identical in behavior with a regular neuron.

So I implemented 2 mechanism, 1) the inhibition would be at the neuron body and 2) the inhibition acts directly on synapses. None of them makes any sense.. When inhibiting the body (1), synapses are not protected by LTD/LTP effect => the net effect is a decrease in the firing rate and there is no permanent inhibition for any patters.. When inhibition hits, couple of activation cycles are skipped so you have patches of decreased frequency of firing alternating with no firing. When inhibition is at synapse level, I end up only with skipped activation cycles, the firing frequency remains the same since synapses are protected by LTP/LDP effects..

In both cases I see no use for that delay in activation, because is not a single patterns that is delayed, all of them are. The separation between patterns is not great (couple of cycles apart), the problem comes also from inhibitory firing with very low frequency. Their activation depends on the activatory neurons, which have decreased activation themselves as you go up in the layer number.

I sort of defined my “objective” function for a synapse:

I also defined LTD and LTP, I initially believed they are just the fitting event for the objective function, but it seems they have multiple roles .

I made a lot of progress but still no “significant” progress..

Long term depression / potentiation

I use these terms very loosely to mean the increase (LTP) or decrease (LTD) of potential delivered by a synapse to the neuron body. I used some forms of LTP/LTD in my previous versions but they were never meant to approach the biological equivalents. Now, I spent some time to read what is known in biology about LTP and LTD and while there are tones of papers on the subject, I could not find anything that has explored the need for such mechanisms.. They are used in “leaning”, that’s a very vague statement.. I’ve been thinking and I cannot find a use case for them. I found a use for an LTD event during depolarization of postsynaptic neuron, but no uses for LTD/P associated with low/high frequency inputs.. What is low/high frequency ? They seem arbitrary to me. I can use whatever frequency in my code, I should be able to link these terms with something… but to what ?

However, I believe the main reason for having an altered synaptic potential is to change the firing frequency of the postsynaptic neuron… This conclusion troubles me, firing rate is crucial in selecting / separating events (inputs), any alteration (or missing alteration) can be picked up by the inhibitory neuron and amplified, resulting in vastly different results even when the initial change in synaptic output was extremely small. So if I don’t add them now and don’t understand them, they may come back to haunt me…. yet, I don’t need them..

My plan is to explore the implication of LTP and LTD on various other variables but with no certain goal in mind, I find that both boring and difficult.. Is there a paper showing they are actually “long” term changes ? I found a paper saying that very few last up to a week and most alterations vanish within hours. I don’t consider that, long term…

Does LTP/LTD stop ever ? with age perhaps ? for certain layers maybe ? Do they become less frequent ? So many questions… Given my difficulties in transporting signal through layers is still feasible that some LTP/D events would be extremely difficult to change in deeper layers, so they in the end could be viewed as long term and part of the learning mechanism.. So they are long term because they are hard to change… speculations …

Inter-layer transmission

If neuron in layer 1 (L1), requires 10 inputs to fire, and those 10 inputs are delivered in 10 cycles, another neuron in layer 2, requiring also 10 inputs for activation, is activated in 100 cycles by the neuron in L1… In third layer, the cycles required for activation is 1000… So this cannot work like this.

I was aware of this issue since the beginning but I have hoped I can solve it by increasing synaptic efficiency so basically neuron from L2 would require not 10 inputs from L1, but say just 1… That would have been acceptable…The problem with this approach became apparent very late, by increasing synapse efficiency, the selectivity of the post-synaptic neuron decreases. So the solution I envisioned proved to be a dead end. Now I’m considering other approaches to deal with this slow transmission from layer to layer..

  1. Would be to have multiple synapses between Neuron from L1 and neuron from L2. This does not look very promising from various reasons, but maybe in combination with other ideas, could work… not necessarily make 10 synapse, but even 2 synapses would reduce significantly the delay.
  2. have much more neurons in L2 then in L1. And those extra neurons would serve as some sort of amplifier .. would bind among themselves, and excite each other in a bizarre loop. I have played with such loops in the past but they resulted in continuous excitation. Maybe they could be used to store more patterns too… I was planning to add more neurons in L2 anyway, so I’m more inclined to start with this approach.
  3. accept a serious reduction of signal in L2… Basically 10 neurons from L1 could link to a single neuron in L2, and that neuron would fire immediately after the 10 neurons from L1 fired because it receives 10 inputs. This could be part of the solution, but I don’t see this as acceptable (this is what is happening right now by default, when there are multiple binding from L1 to L2)
  4. Something else that is unknown now…

I’m also not happy with the inhibitory neurons… By acting fast (require just 1 input to go active) and being 100% efficient, removes some of the learning rules I have envisioned.. They are not in my immediate focus but they are bothering me..

The new synapse kinetics work extremely well, beyond my expectations.