Discussion:
Question regarding 'sched: RT throttling activated'
Wolfgang Wallner
2012-09-11 11:38:02 UTC
Permalink
Hello all,

I have a few questions about the logging entry "sched: RT throttling activated" which I observe, and hope this is the correct place to ask :)
I will start by describing how i come to see this logging entry:

I use an industrial PC with an 2.6.33 RT PREEMP kernel to run a real time application (based on the openPOWERLINK stack [1]), which consists of a kernel part and a user space part.
My application uses the openPOWERLINK stack to exchange data with other real time nodes in the network (a good overview how this is done is given on wikipedia [2]).

When I let my application run overnight, I always see that it stops working after some hours, and the output of dmesg shows the following:

[ 622.722690] EdrvInitOne waiting for link up...
[ 624.439269] EdrvInitOne finished with 0
[ 624.439334] EdrvInit local MAC = 00 60 65 06 B2 25
[ 624.454662] EplApiProcessImageAlloc: Alloc(640, 1100, 2, 2)
[ 624.454672] EplApiProcessImageAlloc: Alloc(eeea2000, 640, eef36800, 1100)
[ 624.477242] ifconfig epl 192.168.100.240 returned 0
[ 635.058012] epl: no IPv6 routers present
[60382.945209] sched: RT throttling activated
[60382.945209] EplTimerHighResk: Continuous timer (handle 0x10000001) had to skip 836 interval(s)!


The first 7 entries are the normal logging output from the powerlink stack as it starts up. I just copied them into the mail for completeness.
The last line is debugging output of the powerlink stack as it complains that its timer was not handled for some time.

The line with "sched: RT throttling activated" is the one I'm interested in.
I don't know what it means, and searching for this term did not turn up much.


My questions are now:

* What does this logging entry mean?
Could you please point me to some information about RT throttling so that I can understand what's it about?

* How to debug this issue?
At the moment I don't know what is cause and what is effect.
I'm not sure if I have a misconfiguration somewhere on my linux system and this influences my application or vice versa.

Any tips on how to find out what's going on are greatly appreciated :)


Regards,
Wolfgang Wallner

[1] http://sourceforge.net/projects/openpowerlink/
[2] http://en.wikipedia.org/wiki/Ethernet_Powerlink--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
John Kacur
2012-09-11 12:10:20 UTC
Permalink
Post by Wolfgang Wallner
Hello all,
I have a few questions about the logging entry "sched: RT throttling activated" which I observe, and hope this is the correct place to ask :)
I use an industrial PC with an 2.6.33 RT PREEMP kernel to run a real time application (based on the openPOWERLINK stack [1]), which consists of a kernel part and a user space part.
My application uses the openPOWERLINK stack to exchange data with other real time nodes in the network (a good overview how this is done is given on wikipedia [2]).
[ 622.722690] EdrvInitOne waiting for link up...
[ 624.439269] EdrvInitOne finished with 0
[ 624.439334] EdrvInit local MAC = 00 60 65 06 B2 25
[ 624.454662] EplApiProcessImageAlloc: Alloc(640, 1100, 2, 2)
[ 624.454672] EplApiProcessImageAlloc: Alloc(eeea2000, 640, eef36800, 1100)
[ 624.477242] ifconfig epl 192.168.100.240 returned 0
[ 635.058012] epl: no IPv6 routers present
[60382.945209] sched: RT throttling activated
[60382.945209] EplTimerHighResk: Continuous timer (handle 0x10000001) had to skip 836 interval(s)!
The first 7 entries are the normal logging output from the powerlink stack as it starts up. I just copied them into the mail for completeness.
The last line is debugging output of the powerlink stack as it complains that its timer was not handled for some time.
The line with "sched: RT throttling activated" is the one I'm interested in.
I don't know what it means, and searching for this term did not turn up much.
What kind of a search? A grep of "RT\ throttling" would have found it.
Post by Wolfgang Wallner
* What does this logging entry mean?
Could you please point me to some information about RT throttling so that I can understand what's it about?
in the kernel, look at Documentation/scheduler/sched-rt-group.txt
Post by Wolfgang Wallner
* How to debug this issue?
At the moment I don't know what is cause and what is effect.
I'm not sure if I have a misconfiguration somewhere on my linux system and this influences my application or vice versa.
Any tips on how to find out what's going on are greatly appreciated :)
It is considered a feature that allows non-RT tasks to make some progress,
so that poorly written user space programs can't totally freeze your
system - which would often be blamed on the rt kernel. You can shut this
off for a harder kind of real-time - in which case, if your system is freezing,
turning it back on could help to show whether the kernel is really
freezing, or if it is a result of a problem with the user-space program.

John
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mike Galbraith
2012-09-11 13:20:17 UTC
Permalink
Post by Wolfgang Wallner
Post by Wolfgang Wallner
Hello all,
I have a few questions about the logging entry "sched: RT throttling
activated" which I observe, and hope this is the correct place to
ask :)
Post by Wolfgang Wallner
I use an industrial PC with an 2.6.33 RT PREEMP kernel...
BTW, there have been a few throttle fixes since then that you'll likely
want to backport. I don't have a list handy, but they're easy to find
with git.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mike Galbraith
2012-09-11 13:05:02 UTC
Permalink
Post by Wolfgang Wallner
Hello all,
I have a few questions about the logging entry "sched: RT throttling
activated" which I observe, and hope this is the correct place to
ask :)
I use an industrial PC with an 2.6.33 RT PREEMP kernel to run a real
time application (based on the openPOWERLINK stack [1]), which
consists of a kernel part and a user space part.
My application uses the openPOWERLINK stack to exchange data with
other real time nodes in the network (a good overview how this is done
is given on wikipedia [2]).
When I let my application run overnight, I always see that it stops
[ 622.722690] EdrvInitOne waiting for link up...
[ 624.439269] EdrvInitOne finished with 0
[ 624.439334] EdrvInit local MAC = 00 60 65 06 B2 25
[ 624.454662] EplApiProcessImageAlloc: Alloc(640, 1100, 2, 2)
[ 624.454672] EplApiProcessImageAlloc: Alloc(eeea2000, 640, eef36800,
1100)
[ 624.477242] ifconfig epl 192.168.100.240 returned 0
[ 635.058012] epl: no IPv6 routers present
[60382.945209] sched: RT throttling activated
[60382.945209] EplTimerHighResk: Continuous timer (handle 0x10000001)
had to skip 836 interval(s)!
Likely realtime task went gaga. Kernel or userspace doesn't matter.
Post by Wolfgang Wallner
* What does this logging entry mean?
Could you please point me to some information about RT throttling
so that I can understand what's it about?
With stock settings, it means realtime task[s] consumed > 95% of the
throttle interval (1s), so the throttle activated, allowing SCHED_NORMAL
tasks to have a sip of CPU, to let you try to save the box from nutty RT
CPU hogs. See kernel/sched_rt.c.
Post by Wolfgang Wallner
* How to debug this issue?
At the moment I don't know what is cause and what is effect.
I'm not sure if I have a misconfiguration somewhere on my linux system
and this influences my application or vice versa.
Any tips on how to find out what's going on are greatly appreciated :)
Hm, I would revoke godlike powers until the recipients can be trusted to
use them wisely. 100% CPU utilization being considered highly unwise.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Wolfgang Wallner
2012-09-11 15:34:12 UTC
Permalink
Hello,

thanks for your feedback.
Post by John Kacur
Post by Wolfgang Wallner
The line with "sched: RT throttling activated" is the one I'm interested in.
I don't know what it means, and searching for this term did not turn up much.
What kind of a search? A grep of "RT\ throttling" would have found it.
Sorry, I was not aware of this (I'm not used to working with kernel sources), I did a web search for "sched: RT throttling activated".
I will consider greping the kernel sources next time (to be clear: you did mean a grep in the kernel sources?).
Post by John Kacur
Likely realtime task went gaga. Kernel or userspace doesn't matter.
Post by Wolfgang Wallner
* What does this logging entry mean?
Could you please point me to some information about RT throttling
so that I can understand what's it about?
With stock settings, it means realtime task[s] consumed > 95% of the
throttle interval (1s), so the throttle activated, allowing SCHED_NORMAL
tasks to have a sip of CPU, to let you try to save the box from nutty RT
CPU hogs. See kernel/sched_rt.c.
I think if the application turns into a cpu hog, this would be visible in a sched_switch ftrace.
Thus, I added the line tracing_off(); in kernel/sched_rt.c at the place where the "sched: RT throttling activated" string is printed and recompiled my kernel. I will let the sched_switch tracing run togheter with my application until tomorrow and hopefully see in the trace what's wrong.

I did not change the default throttling settings. Thus, the error should be visible in the resulting trace about 0.95 seconds before the end of the trace.

Is this approach reasonable? Any other suggestions?
Post by John Kacur
BTW, there have been a few throttle fixes since then that you'll likely
want to backport. I don't have a list handy, but they're easy to find
with git.
Thanks for the tip, I will have a look at it.
Will it be enough to watch out for changes in sched_rt.c or is throttling also handled elsewhere?

regards, Wolfgang


-----linux-rt-users-***@vger.kernel.org schrieb: -----
An: John Kacur <***@redhat.com>
Von: Mike Galbraith
Gesendet von: linux-rt-users-***@vger.kernel.org
Datum: 11.09.2012 15:20
Kopie: Wolfgang Wallner <***@br-automation.com>, linux-rt-***@vger.kernel.org, Josef Baumgartner <***@br-automation.com>
Betreff: Re: Question regarding 'sched: RT throttling activated'
Post by John Kacur
Post by Wolfgang Wallner
Hello all,
I have a few questions about the logging entry "sched: RT throttling
activated" which I observe, and hope this is the correct place to
ask :)
Post by Wolfgang Wallner
I use an industrial PC with an 2.6.33 RT PREEMP kernel...
BTW, there have been a few throttle fixes since then that you'll likely
want to backport. I don't have a list handy, but they're easy to find
with git.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mike Galbraith
2012-09-11 18:10:29 UTC
Permalink
Post by Mike Galbraith
Post by Mike Galbraith
Post by Wolfgang Wallner
* What does this logging entry mean?
Could you please point me to some information about RT throttling
so that I can understand what's it about?
With stock settings, it means realtime task[s] consumed > 95% of the
throttle interval (1s), so the throttle activated, allowing
SCHED_NORMAL
Post by Mike Galbraith
tasks to have a sip of CPU, to let you try to save the box from
nutty RT
Post by Mike Galbraith
CPU hogs. See kernel/sched_rt.c.
I think if the application turns into a cpu hog...
Where from comes 'if'? You presented evidence, so methinks there's not
a _lot_ of room for an 'if', there's just a missing 'why'.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sven-Thorsten Dietrich
2012-09-11 18:55:38 UTC
Permalink
Post by Mike Galbraith
Post by Mike Galbraith
Post by Mike Galbraith
Post by Wolfgang Wallner
* What does this logging entry mean?
Could you please point me to some information about RT throttling
so that I can understand what's it about?
With stock settings, it means realtime task[s] consumed > 95% of the
throttle interval (1s), so the throttle activated, allowing
SCHED_NORMAL
Post by Mike Galbraith
tasks to have a sip of CPU, to let you try to save the box from
nutty RT
Post by Mike Galbraith
CPU hogs. See kernel/sched_rt.c.
I think if the application turns into a cpu hog...
Where from comes 'if'? You presented evidence, so methinks there's not
a _lot_ of room for an 'if', there's just a missing 'why'.
Well, as you know a lot of folks do polling, but its important to make sure not ALL cores are doing that...

Need to look at the code, maybe we could add a mask to restrict throttling per cpu (e.g. to 0) ?

Sven
Post by Mike Galbraith
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mike Galbraith
2012-09-12 03:40:51 UTC
Permalink
Post by Sven-Thorsten Dietrich
Post by Mike Galbraith
Post by Mike Galbraith
Post by Mike Galbraith
Post by Wolfgang Wallner
* What does this logging entry mean?
Could you please point me to some information about RT throttling
so that I can understand what's it about?
With stock settings, it means realtime task[s] consumed > 95% of the
throttle interval (1s), so the throttle activated, allowing
SCHED_NORMAL
Post by Mike Galbraith
tasks to have a sip of CPU, to let you try to save the box from
nutty RT
Post by Mike Galbraith
CPU hogs. See kernel/sched_rt.c.
I think if the application turns into a cpu hog...
Where from comes 'if'? You presented evidence, so methinks there's not
a _lot_ of room for an 'if', there's just a missing 'why'.
Well, as you know a lot of folks do polling, but its important to make
sure not ALL cores are doing that...
[60382.945209] EplTimerHighResk: Continuous timer (handle 0x10000001)
had to skip 836 interval(s)!
That and 'stops working' (as in forever) made me suspect spinner rather
than transient cpu over-commit.
Post by Sven-Thorsten Dietrich
Need to look at the code, maybe we could add a mask to restrict
throttling per cpu (e.g. to 0) ?
IMHO it's a debugging tool you turn off for normal operation. Biggest
problem I've ever had with the thing is it allowing hogs to run over to
the neighbors to borrow a cup of CPU. That's fixed.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...